admin管理员组

文章数量:1314851

I want to parse some html with htmlparser2 module for Node.js. My task is to find a precise element by its ID and extract its text content.

I have read the documentation (quite limited) and I know how to setup my parser with the onopentag function but it only gives access to the tag name and its attributes (I cannot see the text). The ontext function extracts all text nodes from the given html string, but ignores all markup.

So here's my code.

const htmlparser = require("htmlparser2");
const file = '<h1 id="heading1">Some heading</h1><p>Foobar</p>';

const parser = new htmlparser.Parser({
  onopentag: function(name, attribs){
    if (attribs.id === "heading1"){
      console.log(/*how to extract text so I can get "Some heading" here*/);
    }
  },
   
  ontext: function(text){
    console.log(text); // Some heading \n Foobar
  }
});

parser.parseComplete(file);

I expect the output of the function call to be 'Some heading'. I believe that there is some obvious solution but somehow it misses my mind.

Thank you.

I want to parse some html with htmlparser2 module for Node.js. My task is to find a precise element by its ID and extract its text content.

I have read the documentation (quite limited) and I know how to setup my parser with the onopentag function but it only gives access to the tag name and its attributes (I cannot see the text). The ontext function extracts all text nodes from the given html string, but ignores all markup.

So here's my code.

const htmlparser = require("htmlparser2");
const file = '<h1 id="heading1">Some heading</h1><p>Foobar</p>';

const parser = new htmlparser.Parser({
  onopentag: function(name, attribs){
    if (attribs.id === "heading1"){
      console.log(/*how to extract text so I can get "Some heading" here*/);
    }
  },
   
  ontext: function(text){
    console.log(text); // Some heading \n Foobar
  }
});

parser.parseComplete(file);

I expect the output of the function call to be 'Some heading'. I believe that there is some obvious solution but somehow it misses my mind.

Thank you.

Share Improve this question edited Feb 17, 2021 at 9:27 Steve Chambers 39.5k29 gold badges176 silver badges220 bronze badges asked May 27, 2019 at 0:12 dekrossdekross 1041 silver badge7 bronze badges 4
  • 1 Is there a reason you want to use this specific library? Do you have to? For some people, something like Cheerio is a bit easier to use since it has a jQuery like interface you can leverage. – Vaughan Hilts Commented May 27, 2019 at 0:47
  • Thank for the question. No, I don’t have to use this particular library, but it seems pretty popular and fast. Regarding Cheerio, I don’t know jQuery, so it doesn’t look very friendly to me. – dekross Commented May 27, 2019 at 1:13
  • 1 I'll write something up for you. I don't think a parser is the way to go about this. – Vaughan Hilts Commented May 27, 2019 at 1:22
  • 1 I added an answer for you. The library you are using above is more about inspecting the structure of things and it's support for querying is kinda second class from what I understand. I left both examples for you though so you can learn. – Vaughan Hilts Commented May 27, 2019 at 1:45
Add a ment  | 

1 Answer 1

Reset to default 7

You can do it like this using the library you asked about:

const htmlparser = require('htmlparser2');
const domUtils = require('domutils');

const file = '<h1 id="heading1">Some heading</h1><p>Foobar</p>';

var handler = new htmlparser.DomHandler(function(error, dom) {
  if (error) {
    console.log('Parsing had an error');
    return;
  } else {
    const item = domUtils.findOne(element => {
      const matches = element.attribs.id === 'heading1';
      return matches;
    }, dom);

    if (item) {
      console.log(item.children[0].data);
    }
  }
});

var parser = new htmlparser.Parser(handler);
parser.write(file);
parser.end();

The output you will get is "Some Heading". However, you will, in my opinion, find it easier to just use a querying library that is meant for it. You of course, don't need to do this, but you can note how much simpler the following code is: How do I get an element name in cheerio with node.js

Cheerio OR a querySelector API such as https://www.npmjs./package/node-html-parser if you prefer the native query selectors is much more lean.

You can pare that code to something more lean, such as the node-html-parser which supports simply querying:

const { parse } = require('node-html-parser');

const file = '<h1 id="heading1">Some heading</h1><p>Foobar</p>';
const root = parse(file);
const text = root.querySelector('#heading1').text;
console.log(text);

本文标签: javascriptSelecting an html node39s text content with htmlparser2 in NodejsStack Overflow