admin管理员组文章数量:1425705
I want to scrape data from and I'm using Puppeteer.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto(";);
let arr = await page.evaluate(() => {
let text = document.getElementsByClassName("price ng-star-inserted")
let array = []
for (let i = 0; i < text.length; i++) {
array.push(text[i].innerText)
}
console.log(array)
})
})()
The problem is that when I run this script, it opens its own browser and opens the page where I am not authorized, so I can't scrape data because even if I paste my login and password, I have to confirm this in steam. So, how can I do this from my browser where I am authorized, or how can I fix this problem? Maybe another library?
I want to scrape data from https://csfloat/search?def_index=4727 and I'm using Puppeteer.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto("https://csfloat/search?def_index=4727");
let arr = await page.evaluate(() => {
let text = document.getElementsByClassName("price ng-star-inserted")
let array = []
for (let i = 0; i < text.length; i++) {
array.push(text[i].innerText)
}
console.log(array)
})
})()
The problem is that when I run this script, it opens its own browser and opens the page where I am not authorized, so I can't scrape data because even if I paste my login and password, I have to confirm this in steam. So, how can I do this from my browser where I am authorized, or how can I fix this problem? Maybe another library?
Share Improve this question edited Jan 21 at 5:57 ggorlen 58.1k8 gold badges114 silver badges157 bronze badges asked Jan 17 at 19:05 Matvey AndrosyukMatvey Androsyuk 1891 silver badge7 bronze badges 02 Answers
Reset to default 2You can always use your beloved browser developer tools.
And then select the Console
tab to write your own script there.
Or you can also use Recorder
tab when you want to do some automated routine task daily or hourly.
You can access it by selecting the double chevron arrow on tab bar.
And there, you can do many things to automate clicks, scroll, and even waiting for an element to be exist and visible. Then you can always export it to puppeteer script, if you like to.
I hope this can help you much.
Edi gives some good suggestions, but to supplement those, here are a few other approaches. There's no silver bullet in web scraping, so you'll need to experiment to see what works for a particular site (I don't have a Steam account).
- Launch Puppeteer with the
userDataDir
flag, then run it once headfully with a long timeout or REPL and login manually. The session should be saved, so on subsequent runs, you'll be pre-authorized.- The major drawback is that the session may expire within hours or days, which might be a deal-breaker. But for sites that persist sessions for months, this could be a viable option.
- A variant on this is extracting the session cookie by hand and copying it into a plain Node
fetch
call. A simple example of this strategy is here, and a full-fledged tool based on manual cookie copying is replit-exporter. The same caveats as above apply.
- Connect to an existing browser session with Puppeteer and login manually.
- Without Puppeteer, you can keep a normal browser session open in a tab with a userscript or console code (as Edi showed) that extracts the data you want and sends it to a server for processing (writing to file, etc). I wrote a blog post on this technique. Using the recorder feature is a variant on this.
- Automate the login process fully with Puppeteer so the script can run end to end. This might be tricky for certain auth strategies like Google and Steam, which take great pains to prevent this. This is the only truly scalable option, but not all automations need to scale.
本文标签: javascriptHow to write scrapers which will not ask to log inStack Overflow
版权声明:本文标题:javascript - How to write scrapers which will not ask to log in - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1745347528a2654562.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论