admin管理员组

文章数量:1403461

There are divs with class="xj7" Below that , there is an a=href link.

How can i access the value of the a link? What makes it even trickier is that there are many elements with the same classname - so ideally i want to loop through them.

Another hindrance is that the link is in relative form. That means it doesn't specify the domain name. It is like this:

<div class="xj7">
    <a href="/tst/gfhe7sje">

There are divs with class="xj7" Below that , there is an a=href link.

How can i access the value of the a link? What makes it even trickier is that there are many elements with the same classname - so ideally i want to loop through them.

Another hindrance is that the link is in relative form. That means it doesn't specify the domain name. It is like this:

<div class="xj7">
    <a href="/tst/gfhe7sje">
Share Improve this question asked Jun 10, 2018 at 23:45 user1584421user1584421 3,89312 gold badges57 silver badges98 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 3

Try this and let me know if it works.

async function run(){
    await page.goto('<url_here>');
    let div_selector= "div.xj7.Kwh5n"; 

    let list_length    = await page.evaluate((sel) => {
            let elements = Array.from(document.querySelectorAll(sel));
            return elements.length;
    }, div_selector);

    for(let i=0; i< list_length; i++){
        var href = await page.evaluate((l, sel) => {
                    let elements= Array.from(document.querySelectorAll(sel));
                    let anchor  = elements[l].getElementsByTagName('a')[0];
                    if(anchor){
                        return anchor.href;
                    }else{
                        return '';
                    }
                }, i, div_selector);
        console.log('--------> ', href)
    }
    await browser.close();
}
run();

You can do this:

const crawl = async (url) => {
  try {
    console.log(`Crawling ${url}`)
    const browser = await puppeteer.launch()
    const page = await browser.newPage()
    await page.goto(url)

    const selector = '.xj7 > a'
    await page.waitForSelector(selector)
    const links = await page.$$eval(selector, am => am.filter(e => e.href).map(e => e.href))

    console.log(links)

    await browser.close()
  } catch (err) {
    console.log(err)
  }
}

crawl('https://example.')

本文标签: javascriptPuppeteerRetrieving links from divs with specific class namesStack Overflow