javascript - Puppeteer Error, Cannot read property 'getProperty' of undefined while scraping white pages - Stack

IT技术

更新时间：2025-03-151

admin管理员组
文章数量:1314480

I'm trying to scrape an address from whitepages, but my scraper keeps throwing this error every time I run it.

(node:11389) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'getProperty' of undefined

here's my code:

const puppeteer = require('puppeteer')

async function scrapeAddress(url){
    const browser = await puppeteer.launch();

    const page = await browser.newPage();
    await page.goto(url,{timeout: 0, waitUntil: 'networkidle0'});

    const [el]= await page.$x('//*[@id="left"]/div/div[4]/div[3]/div[2]/a/h3/span[1]');
    // console.log(el)
    const txt = await el.getProperty('textContent');
    const rawTxt = await txt.jsonValue(); 

    console.log({rawTxt}); 

    browser.close();

}

scrapeAddress('')

After investigating a bit, I realized that the el variable is getting returned as undefined and I'm not sure why. I've tried this same code to get elements from other sites but only for this site am I getting this error.

I tried both the full and short XPath as well as other surrounding elements and everything on this site throws this error.

Why would this be happening and is there any way I can fix it?

I'm trying to scrape an address from whitepages., but my scraper keeps throwing this error every time I run it.

(node:11389) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'getProperty' of undefined

here's my code:

const puppeteer = require('puppeteer')

async function scrapeAddress(url){
    const browser = await puppeteer.launch();

    const page = await browser.newPage();
    await page.goto(url,{timeout: 0, waitUntil: 'networkidle0'});

    const [el]= await page.$x('//*[@id="left"]/div/div[4]/div[3]/div[2]/a/h3/span[1]');
    // console.log(el)
    const txt = await el.getProperty('textContent');
    const rawTxt = await txt.jsonValue(); 

    console.log({rawTxt}); 

    browser.close();

}

scrapeAddress('https://www.whitepages./business/CA/San-Diego/Cvs-Health/b-1ahg5bs')

I tried both the full and short XPath as well as other surrounding elements and everything on this site throws this error.

Why would this be happening and is there any way I can fix it?

Share Improve this question asked Jan 18, 2020 at 8:17 Zafar Saifi 211 silver badge2 bronze badges

Add a ment |

6 Answers 6

Sorted by: Reset to default 3

You can try wrapping everything in a try catch block, otherwise try unwrapping the promise with then().

(async() => {
  const browser = await puppeteer.launch();
  try {
    const page = await browser.newPage();
    await page.goto(url,{timeout: 0, waitUntil: 'networkidle0'});

    const [el]= await page.$x('//*[@id="left"]/div/div[4]/div[3]/div[2]/a/h3/span[1]');
    // console.log(el)
    const txt = await el.getProperty('textContent');
    const rawTxt = await txt.jsonValue(); 

    console.log({rawTxt}); 

  } catch (err) {
    console.error(err.message);
  } finally {
    await browser.close();
  }
})();

The reason is the website detects puppeteer as an automated bot. Set the headless to false and you can see it never navigates to the website.

I'd suggest using puppeteer-extra-plugin-stealth. Also always make sure to wait for the element to appear in the page.

const puppeteer = require('puppeteer-extra');
const pluginStealth = require('puppeteer-extra-plugin-stealth');
puppeteer.use(pluginStealth());

async function scrapeAddress(url){
    const browser = await puppeteer.launch();

    const page = await browser.newPage();
    await page.goto(url,{waitUntil: 'networkidle0'});

    //wait for xpath
    await page.waitForXPath('//*[@id="left"]/div/div[4]/div[3]/div[2]/a/h3/span[1]');
    const [el]= await page.$x('//*[@id="left"]/div/div[4]/div[3]/div[2]/a/h3/span[1]');
    // console.log(el)
    const txt = await el.getProperty('textContent');
    const rawTxt = await txt.jsonValue(); 

    console.log({rawTxt}); 

    browser.close();

}

scrapeAddress('https://www.whitepages./business/CA/San-Diego/Cvs-Health/b-1ahg5bs')

I recently ran into this error and changing my xpath worked for me. I had one grabbing the Full xpath and it was causing some issues

Most probably because the website is responsive, therefore when the scraper runs, it shows different XPATH.

I would suggest you to debug by using a headless browser:

const browser = await puppeteer.launch({headless: false});

I took the code that @mbit provided and modified it to my needs and also used a headless browser. I was unable to do it using a headless browser. If anyone was able to figure out how to do that please explain. Here is my solution:

first you must install a couple things in console bash so run the following two mands:

npm install puppeteer-extra
npm install puppeteer-extra-plugin-stealth

Installing these will allow you to run the first few lines in @mbit 's code. Then in this line of code:

 const browser = await puppeteer.launch();

as a parameter to puppeteer.launch(); pass in the following:

{headless: false}

which should in turn look like this:

const browser = await puppeteer.launch({headless: false});

I also believe that the Path that @mbit was using may not exist anymore so provide one of your own as well as a site. You can do this using the following 3 lines of code, just replace {XPath} with your own XPath and {address} with your own web address. NOTE: be mindful of your usage of quotes '' or "" as the XPath address may have the same ones that you are used to using which will mess up your path.

await page.waitForXPath({XPath});
const [el]= await page.$x({XPath});

scrapeAddress({address})

After you do this you should be able to run your code and retrieve values Heres what my code looked like in the end, feel free to copy paste into your own file to confirm that it works on your end at all!

let puppeteer = require('puppeteer-extra');
let pluginStealth = require('puppeteer-extra-plugin-stealth');
puppeteer.use(pluginStealth());

puppeteer = require('puppeteer')

async function scrapeAddress(url){
    const browser = await puppeteer.launch({headless: false});

    const page = await browser.newPage();
    await page.goto(url,{waitUntil: 'networkidle0'});

    //wait for xpath
    await page.waitForXPath('//*[@id="root"]/div[1]/div[2]/div[2]/div[9]/div/div/div/div[3]/div[2]/div[3]/div[3]');
    const [el]= await page.$x('//*[@id="root"]/div[1]/div[2]/div[2]/div[9]/div/div/div/div[3]/div[2]/div[3]/div[3]');
    
    const txt = await el.getProperty('textContent');
    const rawTxt = await txt.jsonValue(); 

    console.log({rawTxt}); 

    browser.close();
}

scrapeAddress("https://stockx./air-jordan-1-retro-high-unc-leather")

I was able to fix it by adding {waitUntil: 'networkidle0'} to the page.goto mand:

await page.goto(url, {waitUntil: 'networkidle0'});

Was running into the same issue so I tried @mbit's solution and it worked. After some tests I realized didn't actually needed puppeteer-extra-plugin-stealth running. Implementing the await page.goto mand worked just fine!

本文标签：

版权声明：本文标题：javascript - Puppeteer Error, Cannot read property 'getProperty' of undefined while scraping white pages - Stack 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741968658a2407698.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

发表评论

全部评论 0

暂无评论

编程频道|软件玩家 - 软件改变生活！

javascript - Puppeteer Error, Cannot read property &#39;getProperty&#39; of undefined while scraping white pages - Stack

6 Answers 6

更多相关文章

javascript - How to Disable Click on a radio button without disabling it in JQuery? - Stack Overflow

Can I replace the search bar with Programmable Search by Google?

javascript - Format moment to string - Stack Overflow

javascript - react ref.example.value vs. e.target.value - Stack Overflow

Javascript location.reload() in setInterval function is not working as expected - Stack Overflow

Getting the length of a nested JSON array in JavaScript - Stack Overflow

user registration - Where are people registering on my website?

Equivalent of iOS itms-services for Android to Enable Silent In-App Updates - Stack Overflow

plugins - SEO Site Title appearing in google search despite not being added

jquery - Javascript: TypeError: Value does not implement interface FormData - Stack Overflow

javascript - Angularjs can&#39;t show object in html - Stack Overflow

vuejs3 - How to signal a view when pinia store is fully loaded? - Stack Overflow

javascript - How to exclude a log4js category from being logged to the default appenders - Stack Overflow

javascript - MIDI.js - Unable to change instruments - Stack Overflow

javascript - how would you regex validate a string of mobile numbers? - Stack Overflow

javascript - Check filesize before uploading to file system - Stack Overflow

javascript - HTML Bootstrap table title (caption) with title on left side and buttons on right side - Stack Overflow

python - Window not being centered on the screen - Stack Overflow

amazon web services - How to upload a file without blocking the UI? - Stack Overflow

javascript - Notification.requestPermission is undefined from ServiceWorker execution file. Chrome - Stack Overflow

发表评论

推荐文章

postgresql - How to improve Postgres pg_trgm for text similarity and make more similar text rank higher? - Stack Overflow

javascript - how to call raphael methods on jquery objects? - Stack Overflow

django - How to update large amount of records in chunks? - Stack Overflow

How to access a variable from outer scope in JavaScript - Stack Overflow

How to align elements in a flutter wrap evenly and independent of the width of the content - Stack Overflow

热门文章

woocommerce offtopic - Hide Add to Cart Button

php - Wordpress Template Engine?

DateTime JavaScript vs C# - Stack Overflow

localhost - Why is WordPress showing local host ip address instead of domain name in url?

python - Error &quot;name &#39;clip&#39; is not defined&quot; when using CLIP with FastSam - Stack Overflow

javascript - how to display icon in center in ant design Button - Stack Overflow

How can this regex be made JavaScript compatible? - Stack Overflow

javascript - How to read jquery cookie value in php cookie value? - Stack Overflow

Javascript and TinyMCE (can&#39;t change .value or textarea with TinyMCE) - Stack Overflow

javascript - Reading JSON data from Google Geocoding API with jQuery - Stack Overflow

最新文章

hvv准备ing

Win7各正式版下载地址和SHA验证

怎么样把中文版的Windows7改成英文版的Windows7

Win7系统笔记本蓝牙打开指南：详细步骤助你轻松连接

win7开机弹计算机,win7开机弹出Windows Installer窗口的解决方法

javascript - onClick image, display div content - Stack Overflow

javascript - Ionic 3 Property ‘data’ does not exist on type ‘ArrayBuffer’ - Stack Overflow

fortran - MPI_SEND doesn&#39;t wait for MPI_RECV to complete - Stack Overflow

Creating Custom Taxonomy without mapping to any post type

javascript - How to completely stopresetreinitialize Matter.js canvasworldengineinstance - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

javascript - Puppeteer Error, Cannot read property 'getProperty' of undefined while scraping white pages - Stack

javascript - Angularjs can't show object in html - Stack Overflow

python - Error "name 'clip' is not defined" when using CLIP with FastSam - Stack Overflow

Javascript and TinyMCE (can't change .value or textarea with TinyMCE) - Stack Overflow

fortran - MPI_SEND doesn't wait for MPI_RECV to complete - Stack Overflow