admin管理员组

文章数量:1425705

I want to scrape data from and I'm using Puppeteer.

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage();
  await page.goto(";);

  let arr = await page.evaluate(() => {
    let text = document.getElementsByClassName("price ng-star-inserted")
    let array = []
    for (let i = 0; i < text.length; i++) {
      array.push(text[i].innerText)
    }

    console.log(array)
  })
})()

The problem is that when I run this script, it opens its own browser and opens the page where I am not authorized, so I can't scrape data because even if I paste my login and password, I have to confirm this in steam. So, how can I do this from my browser where I am authorized, or how can I fix this problem? Maybe another library?

I want to scrape data from https://csfloat/search?def_index=4727 and I'm using Puppeteer.

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage();
  await page.goto("https://csfloat/search?def_index=4727");

  let arr = await page.evaluate(() => {
    let text = document.getElementsByClassName("price ng-star-inserted")
    let array = []
    for (let i = 0; i < text.length; i++) {
      array.push(text[i].innerText)
    }

    console.log(array)
  })
})()

The problem is that when I run this script, it opens its own browser and opens the page where I am not authorized, so I can't scrape data because even if I paste my login and password, I have to confirm this in steam. So, how can I do this from my browser where I am authorized, or how can I fix this problem? Maybe another library?

Share Improve this question edited Jan 21 at 5:57 ggorlen 58.1k8 gold badges114 silver badges157 bronze badges asked Jan 17 at 19:05 Matvey AndrosyukMatvey Androsyuk 1891 silver badge7 bronze badges 0
Add a comment  | 

2 Answers 2

Reset to default 2

You can always use your beloved browser developer tools.
And then select the Console tab to write your own script there.

Or you can also use Recorder tab when you want to do some automated routine task daily or hourly.
You can access it by selecting the double chevron arrow on tab bar.

And there, you can do many things to automate clicks, scroll, and even waiting for an element to be exist and visible. Then you can always export it to puppeteer script, if you like to.

I hope this can help you much.

Edi gives some good suggestions, but to supplement those, here are a few other approaches. There's no silver bullet in web scraping, so you'll need to experiment to see what works for a particular site (I don't have a Steam account).

  1. Launch Puppeteer with the userDataDir flag, then run it once headfully with a long timeout or REPL and login manually. The session should be saved, so on subsequent runs, you'll be pre-authorized.
    • The major drawback is that the session may expire within hours or days, which might be a deal-breaker. But for sites that persist sessions for months, this could be a viable option.
    • A variant on this is extracting the session cookie by hand and copying it into a plain Node fetch call. A simple example of this strategy is here, and a full-fledged tool based on manual cookie copying is replit-exporter. The same caveats as above apply.
  2. Connect to an existing browser session with Puppeteer and login manually.
  3. Without Puppeteer, you can keep a normal browser session open in a tab with a userscript or console code (as Edi showed) that extracts the data you want and sends it to a server for processing (writing to file, etc). I wrote a blog post on this technique. Using the recorder feature is a variant on this.
  4. Automate the login process fully with Puppeteer so the script can run end to end. This might be tricky for certain auth strategies like Google and Steam, which take great pains to prevent this. This is the only truly scalable option, but not all automations need to scale.

本文标签: javascriptHow to write scrapers which will not ask to log inStack Overflow