admin管理员组文章数量:1387469
Getting data from 1 page is simple, but how to go back after getting data from first page, enter a new page, get data from that page .. etc. I am trying to do this on a website /.
So, I chose to print how many books are in Stock because it can only be accessed if you enter the link. For example, if you run the code you will get: { stock: 'In stock (22 available)' }
Now, I wish to go back to the original page, enter the second link and take the same information as the previous one. And so on..
How can this be done using vanilla JavaScript?
const puppeteer = require('puppeteer');
let scrape = async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('/');
await page.click('#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article > div.image_container > a > img');
await page.waitFor(1000);
const result = await page.evaluate(() => {
let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;
return {
stock
}
});
browser.close();
return result;
};
scrape().then((value) => {
console.log(value); // Success!
});
Getting data from 1 page is simple, but how to go back after getting data from first page, enter a new page, get data from that page .. etc. I am trying to do this on a website http://books.toscrape./.
So, I chose to print how many books are in Stock because it can only be accessed if you enter the link. For example, if you run the code you will get: { stock: 'In stock (22 available)' }
Now, I wish to go back to the original page, enter the second link and take the same information as the previous one. And so on..
How can this be done using vanilla JavaScript?
const puppeteer = require('puppeteer');
let scrape = async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('http://books.toscrape./');
await page.click('#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article > div.image_container > a > img');
await page.waitFor(1000);
const result = await page.evaluate(() => {
let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;
return {
stock
}
});
browser.close();
return result;
};
scrape().then((value) => {
console.log(value); // Success!
});
Share
Improve this question
edited Apr 23, 2019 at 16:30
Thomas Dondorf
25.3k6 gold badges96 silver badges112 bronze badges
asked Apr 23, 2019 at 16:12
user9746492user9746492
551 silver badge5 bronze badges
1 Answer
Reset to default 6Explanation
What you need to do is call page.goBack()
to go back one page when your task is finished and then click the next element. For this you should use page.$$
to get the list of the clickable elements and use a loop to step over them one after another. Then you can re-run your script to extract the same information for the next page.
Code
I adapted your code to print out your desired result in the console for each page below. Be aware that I changed the selector from your question to remove the :nth-child(1)
to select all clickable elements.
const puppeteer = require('puppeteer');
const elementsToClickSelector = '#default > div > div > div > div > section > div:nth-child(2) > ol > li > article > div.image_container > a > img';
let scrape = async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('http://books.toscrape./');
// get all elements to be clicked
let elementsToClick = await page.$$(elementsToClickSelector);
console.log(`Elements to click: ${elementsToClick.length}`);
for (let i = 0; i < elementsToClick.length; i++) {
// click element
elementsToClick[i].click();
await page.waitFor(1000);
// generate result for the current page
const result = await page.evaluate(() => {
let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;
return { stock };
});
console.log(result); // do something with the result here...
// go back one page and repopulate the elements
await page.goBack();
elementsToClick = await page.$$(elementsToClickSelector);
}
browser.close();
};
scrape();
本文标签:
版权声明:本文标题:javascript - Puppeteer: Open a page, get the data, go back to the previous page, enter a new page to get data - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744546640a2611936.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论