admin管理员组文章数量:1405516
I am working on a web scraper that searches Google for certain things and then pulls text from the result page, and I am having an issue getting Puppeteer to return the text I need. What I want to return is an array of strings.
Let's say I have a couple nested divs within a div, and each has text like so:
<div class='mainDiv'>
<div>Mary Doe </div>
<div> James Dean </div>
</div>
In the DOM, I can do the following to get the result I need:
document.querySelectorAll('.mainDiv')[0].innerText.split('\n')
This yields: ["Mary Doe", "James Dean"]
.
I understand that Puppeteer doesn't return NodeLists, and instead it uses JSHandles, but I still can't figure out how to get any information using the prescribed methods. See below for what I have tried in Puppeteer and the corresponding console output:
In every scenario, I do await page.waitFor('selector')
to start.
Scenario 1 (using .$$eval()
):
const genreElements = await page.$$eval('div.mainDiv', el => el);
console.log(genreElements) // []
Scenario 2 (using evaluate
):
function extractItems() {
const extractedElements = document.querySelectorAll('div.mainDiv')[0].innerText.split('\n')
return extractedElements
}
let items = await page.evaluate(extractItems)
console.log(items) // UnhandledPromiseRejectionWarning: Error: Evaluation failed: TypeError: Cannot read property 'innerText' of undefined
Scenario 3 (using evaluateHandle
):
const selectorHandle = await page.evaluateHandle(() => document.querySelectorAll('div.mainDiv'))
const resultHandle = await page.evaluate(x => x[0], selectorHandle)
console.log(resultHandle) // undefined
Any help or guidance on how I am implementing or how to achieve what I am looking to do is much appreciated. Thank you!
I am working on a web scraper that searches Google for certain things and then pulls text from the result page, and I am having an issue getting Puppeteer to return the text I need. What I want to return is an array of strings.
Let's say I have a couple nested divs within a div, and each has text like so:
<div class='mainDiv'>
<div>Mary Doe </div>
<div> James Dean </div>
</div>
In the DOM, I can do the following to get the result I need:
document.querySelectorAll('.mainDiv')[0].innerText.split('\n')
This yields: ["Mary Doe", "James Dean"]
.
I understand that Puppeteer doesn't return NodeLists, and instead it uses JSHandles, but I still can't figure out how to get any information using the prescribed methods. See below for what I have tried in Puppeteer and the corresponding console output:
In every scenario, I do await page.waitFor('selector')
to start.
Scenario 1 (using .$$eval()
):
const genreElements = await page.$$eval('div.mainDiv', el => el);
console.log(genreElements) // []
Scenario 2 (using evaluate
):
function extractItems() {
const extractedElements = document.querySelectorAll('div.mainDiv')[0].innerText.split('\n')
return extractedElements
}
let items = await page.evaluate(extractItems)
console.log(items) // UnhandledPromiseRejectionWarning: Error: Evaluation failed: TypeError: Cannot read property 'innerText' of undefined
Scenario 3 (using evaluateHandle
):
const selectorHandle = await page.evaluateHandle(() => document.querySelectorAll('div.mainDiv'))
const resultHandle = await page.evaluate(x => x[0], selectorHandle)
console.log(resultHandle) // undefined
Any help or guidance on how I am implementing or how to achieve what I am looking to do is much appreciated. Thank you!
Share edited Jun 8, 2021 at 12:07 DisappointedByUnaccountableMod 6,8464 gold badges20 silver badges23 bronze badges asked Dec 5, 2018 at 21:16 Nigel FinleyNigel Finley 1252 gold badges4 silver badges10 bronze badges 1-
Instead of
querySelectorAll()[0]
(get all, then throw everything away but the first) why notquerySelector()
(get the first)? – ggorlen Commented Mar 14, 2023 at 20:34
3 Answers
Reset to default 4Use page.$$eval() or page.evaluate():
You can use page.$$eval()
or page.evaluate()
to run Array.from(
document.querySelectorAll()
)
within the page context and map()
the innerText
of each element to the result array:
const names_1 = await page.$$eval('.mainDiv > div', divs => divs.map(div => div.innerText));
const names_2 = await page.evaluate(() => Array.from(document.querySelectorAll('.mainDiv > div'), div => div.innerText));
Note: Keep in mind that if you use Puppeteer to automate searches on Google, you may be temporarily blocked and end up with an "Unusual traffic from your puter network" notice, requiring you to solve a reCAPTCHA. This may break your web scraper, so proceed with caution.
Try it like this:
let names = page.evaluate(() => [...document.querySelectorAll('.mainDiv div')].map(div => div.innerText))
That way you can test the whole thing in the chrome console.
Using page.$eval:
const names = await page.$eval('.mainDiv', (element) => {
return element.innerText
});
Here the element is retrieved by selector and directly passed to the function to be evaluated.
Using page.evaluate:
const namesElem = await page.$('.mainDiv');
const names = await page.evaluate(namesElem => namesElem.innerText, namesElem);
This is basically the first method split up into two steps. The interesting part is that ElementHandles can be passed as arguments in page.evaluate() and can be evaluated like JSHandles.
Note that for simplicity and clarification I used the methods for retrieving single elements. But page.$$() and page.$$eval() work the same way while selecting multiple elements and returning an array instead.
本文标签:
版权声明:本文标题:html - Can't access innerText property using Puppeteer - .$$eval and .$$ is not yielding results - JavaScript - Stack Ov 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744917751a2632100.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论