admin管理员组文章数量:1314851
I want to parse some html with htmlparser2 module for Node.js. My task is to find a precise element by its ID and extract its text content.
I have read the documentation (quite limited) and I know how to setup my parser with the onopentag
function but it only gives access to the tag name and its attributes (I cannot see the text). The ontext
function extracts all text nodes from the given html string, but ignores all markup.
So here's my code.
const htmlparser = require("htmlparser2");
const file = '<h1 id="heading1">Some heading</h1><p>Foobar</p>';
const parser = new htmlparser.Parser({
onopentag: function(name, attribs){
if (attribs.id === "heading1"){
console.log(/*how to extract text so I can get "Some heading" here*/);
}
},
ontext: function(text){
console.log(text); // Some heading \n Foobar
}
});
parser.parseComplete(file);
I expect the output of the function call to be 'Some heading'
. I believe that there is some obvious solution but somehow it misses my mind.
Thank you.
I want to parse some html with htmlparser2 module for Node.js. My task is to find a precise element by its ID and extract its text content.
I have read the documentation (quite limited) and I know how to setup my parser with the onopentag
function but it only gives access to the tag name and its attributes (I cannot see the text). The ontext
function extracts all text nodes from the given html string, but ignores all markup.
So here's my code.
const htmlparser = require("htmlparser2");
const file = '<h1 id="heading1">Some heading</h1><p>Foobar</p>';
const parser = new htmlparser.Parser({
onopentag: function(name, attribs){
if (attribs.id === "heading1"){
console.log(/*how to extract text so I can get "Some heading" here*/);
}
},
ontext: function(text){
console.log(text); // Some heading \n Foobar
}
});
parser.parseComplete(file);
I expect the output of the function call to be 'Some heading'
. I believe that there is some obvious solution but somehow it misses my mind.
Thank you.
Share Improve this question edited Feb 17, 2021 at 9:27 Steve Chambers 39.5k29 gold badges176 silver badges220 bronze badges asked May 27, 2019 at 0:12 dekrossdekross 1041 silver badge7 bronze badges 4- 1 Is there a reason you want to use this specific library? Do you have to? For some people, something like Cheerio is a bit easier to use since it has a jQuery like interface you can leverage. – Vaughan Hilts Commented May 27, 2019 at 0:47
- Thank for the question. No, I don’t have to use this particular library, but it seems pretty popular and fast. Regarding Cheerio, I don’t know jQuery, so it doesn’t look very friendly to me. – dekross Commented May 27, 2019 at 1:13
- 1 I'll write something up for you. I don't think a parser is the way to go about this. – Vaughan Hilts Commented May 27, 2019 at 1:22
- 1 I added an answer for you. The library you are using above is more about inspecting the structure of things and it's support for querying is kinda second class from what I understand. I left both examples for you though so you can learn. – Vaughan Hilts Commented May 27, 2019 at 1:45
1 Answer
Reset to default 7You can do it like this using the library you asked about:
const htmlparser = require('htmlparser2');
const domUtils = require('domutils');
const file = '<h1 id="heading1">Some heading</h1><p>Foobar</p>';
var handler = new htmlparser.DomHandler(function(error, dom) {
if (error) {
console.log('Parsing had an error');
return;
} else {
const item = domUtils.findOne(element => {
const matches = element.attribs.id === 'heading1';
return matches;
}, dom);
if (item) {
console.log(item.children[0].data);
}
}
});
var parser = new htmlparser.Parser(handler);
parser.write(file);
parser.end();
The output you will get is "Some Heading". However, you will, in my opinion, find it easier to just use a querying library that is meant for it. You of course, don't need to do this, but you can note how much simpler the following code is: How do I get an element name in cheerio with node.js
Cheerio OR a querySelector API such as https://www.npmjs./package/node-html-parser if you prefer the native query selectors is much more lean.
You can pare that code to something more lean, such as the node-html-parser
which supports simply querying:
const { parse } = require('node-html-parser');
const file = '<h1 id="heading1">Some heading</h1><p>Foobar</p>';
const root = parse(file);
const text = root.querySelector('#heading1').text;
console.log(text);
本文标签: javascriptSelecting an html node39s text content with htmlparser2 in NodejsStack Overflow
版权声明:本文标题:javascript - Selecting an html node's text content with htmlparser2 in Node.js - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741970609a2407813.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论