admin管理员组

文章数量:1387469

Getting data from 1 page is simple, but how to go back after getting data from first page, enter a new page, get data from that page .. etc. I am trying to do this on a website /.

So, I chose to print how many books are in Stock because it can only be accessed if you enter the link. For example, if you run the code you will get: { stock: 'In stock (22 available)' }

Now, I wish to go back to the original page, enter the second link and take the same information as the previous one. And so on..

How can this be done using vanilla JavaScript?

const puppeteer = require('puppeteer');

let scrape = async () => {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('/');
    await page.click('#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article > div.image_container > a > img');
    await page.waitFor(1000);

    const result = await page.evaluate(() => {
        let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;

        return {
            stock
        }
    });

    browser.close();
    return result;
};

scrape().then((value) => {
    console.log(value); // Success!
});

Getting data from 1 page is simple, but how to go back after getting data from first page, enter a new page, get data from that page .. etc. I am trying to do this on a website http://books.toscrape./.

So, I chose to print how many books are in Stock because it can only be accessed if you enter the link. For example, if you run the code you will get: { stock: 'In stock (22 available)' }

Now, I wish to go back to the original page, enter the second link and take the same information as the previous one. And so on..

How can this be done using vanilla JavaScript?

const puppeteer = require('puppeteer');

let scrape = async () => {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('http://books.toscrape./');
    await page.click('#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article > div.image_container > a > img');
    await page.waitFor(1000);

    const result = await page.evaluate(() => {
        let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;

        return {
            stock
        }
    });

    browser.close();
    return result;
};

scrape().then((value) => {
    console.log(value); // Success!
});
Share Improve this question edited Apr 23, 2019 at 16:30 Thomas Dondorf 25.3k6 gold badges96 silver badges112 bronze badges asked Apr 23, 2019 at 16:12 user9746492user9746492 551 silver badge5 bronze badges
Add a ment  | 

1 Answer 1

Reset to default 6

Explanation

What you need to do is call page.goBack() to go back one page when your task is finished and then click the next element. For this you should use page.$$ to get the list of the clickable elements and use a loop to step over them one after another. Then you can re-run your script to extract the same information for the next page.

Code

I adapted your code to print out your desired result in the console for each page below. Be aware that I changed the selector from your question to remove the :nth-child(1) to select all clickable elements.

const puppeteer = require('puppeteer');

const elementsToClickSelector = '#default > div > div > div > div > section > div:nth-child(2) > ol > li > article > div.image_container > a > img';

let scrape = async () => {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('http://books.toscrape./');

    // get all elements to be clicked
    let elementsToClick = await page.$$(elementsToClickSelector);
    console.log(`Elements to click: ${elementsToClick.length}`);

    for (let i = 0; i < elementsToClick.length; i++) {
        // click element
        elementsToClick[i].click();
        await page.waitFor(1000);

        // generate result for the current page
        const result = await page.evaluate(() => {
            let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;
            return { stock };
        });
        console.log(result); // do something with the result here...

        // go back one page and repopulate the elements
        await page.goBack();
        elementsToClick = await page.$$(elementsToClickSelector);
    }

    browser.close();
};

scrape();

本文标签: