admin管理员组文章数量:1287636
Hey guys and ladies first of all this is my first question here in stackoverflow so don't be so hard on me.. but w/e :P. I have a problem.. i'm totally new to web scraping and at the moment i have the problem that i can't select the right elements. My code looks like this:
var express = require('express');
var path = require('path');
var request = require('request');
var cheerio = require('cheerio');
var fs = require('fs');
var app = express();
var port = 8000;
var url = ".html";
request(url, function (err, resp, body) {
if(!err) {
var $ = cheerio.load(body)
var test = $('body table table table > tbody > tr > td > p');
console.log(test.html())
test.each(function (ii, asdf) {
var rr = $(asdf).find("table").find("tr").first().find('td:nth-child(2)').text();
console.log(asdf);
})
} else {
console.log("we encountered an error: " + err);
}
});
app.listen(port);
console.log('server is listening on ' + port);
It keeps logging NULL for the variable test. It seems like cheerio has problems with the > selector. With jQuery this selection would work as expected.
Thanks to @logol's anwser i could solve the first problem but now i facing the problem that i have to select direct childs after body and it seems to bug as the tbody.. any1 got a workaround?
Hey guys and ladies first of all this is my first question here in stackoverflow so don't be so hard on me.. but w/e :P. I have a problem.. i'm totally new to web scraping and at the moment i have the problem that i can't select the right elements. My code looks like this:
var express = require('express');
var path = require('path');
var request = require('request');
var cheerio = require('cheerio');
var fs = require('fs');
var app = express();
var port = 8000;
var url = "http://www.finanzparasiten.de/html/links/awd.html";
request(url, function (err, resp, body) {
if(!err) {
var $ = cheerio.load(body)
var test = $('body table table table > tbody > tr > td > p');
console.log(test.html())
test.each(function (ii, asdf) {
var rr = $(asdf).find("table").find("tr").first().find('td:nth-child(2)').text();
console.log(asdf);
})
} else {
console.log("we encountered an error: " + err);
}
});
app.listen(port);
console.log('server is listening on ' + port);
It keeps logging NULL for the variable test. It seems like cheerio has problems with the > selector. With jQuery this selection would work as expected.
Thanks to @logol's anwser i could solve the first problem but now i facing the problem that i have to select direct childs after body and it seems to bug as the tbody.. any1 got a workaround?
Share Improve this question edited Aug 10, 2016 at 19:05 Jan Hennemann asked Aug 10, 2016 at 18:04 Jan HennemannJan Hennemann 411 gold badge1 silver badge5 bronze badges2 Answers
Reset to default 4Original:
as far as I remember (when I used cheerio the last time) tbody is not recognized in cheerio, just leave it and use this instead:
table > tr > td
PS: thead was working
Update:
it seems to work sometimes even with tbody, try this in REPL
const cheerio = require('cheerio');
const html = '\
<!DOCTYPE html>\
<html>\
<head>\
<title>Cheerio Test</title>\
</head>\
<body>\
<div id="#1">\
<table>\
<thead>\
<tr>\
<th>Month</th>\
<th>Savings</th>\
</tr>\
</thead>\
<tfoot>\
<tr>\
<td>Sum</td>\
<td>180</td>\
</tr>\
</tfoot>\
<tbody>\
<tr>\
<td>January</td>\
<td>100</td>\
</tr>\
<tr>\
<td>February</td>\
<td>80</td>\
</tr>\
</tbody>\
</table>\
</div>\
</body>\
</html>';
const dom = cheerio.load(html);
// not working:
let tds1 = dom('div#1 > table > tbody > tr > td').map(function () {
return dom(this).text().trim();
}).get();
// working:
let tds2 = dom('table > tbody > tr > td').map(function () {
return dom(this).text().trim();
}).get();
// not working:
let tds3 = dom('div#1 > table > tr > td').map(function () {
return dom(this).text().trim();
}).get();
console.log(tds1);
console.log(tds2);
console.log(tds3);
Update:
Based on @logol's response, I checked the docs for Cheerio and it says its selectors are built on CSSSelect Library. Their docs have a list of selectors. Child and Parent selectors are supported and it seems to imply all element selectors are too. However, this github issue flags the tbody issue.
Original:
Do you mean to have the duplicate tables listed in your selector and how you're printing it out in console.
Try this:
var test = $('body table > tbody > tr > td > p');
console.log(test.innerHTML)
The output of this on the webpage is:
<span class="TDheadlinebig">AWD - Allgemeiner
Wirtschaftsdienst</span><span class="TDnormal"><br>
</span><span class="TDheadlinenormal">zweitgrößte "Strukkibude"
</span><span class="TDnormal"><br>
</span>
本文标签: javascriptCheerio direct child selectorStack Overflow
版权声明:本文标题:javascript - Cheerio direct child selector - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741316830a2371966.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论