admin管理员组文章数量:1186449
I'm trying to do some web scraping with Puppeteer and I need to retrieve the value into a Website I'm building.
I have tried to load the Puppeteer file in the html file as if it was a JavaScript file but I keep getting an error. However, if I run it in a cmd window it works well.
Scraper.js:getPrice();
function getPrice() {
const puppeteer = require('puppeteer');
void (async () => {
try {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('')
await page.setViewport({ width: 1920, height: 938 })
await page.waitForSelector('.m-hotel-info > .l-container > .l-header-section > .l-m-col-2 > .m-button')
await page.click('.m-hotel-info > .l-container > .l-header-section > .l-m-col-2 > .m-button')
await page.waitForSelector('.modal-content')
await page.click('.tile-hsearch-hws > .m-search-tabs > #edit-search-panel > .l-em-reset > .m-field-wrap > .l-xs-col-4 > .analytics-click')
await page.waitForNavigation();
await page.waitForSelector('.tile-search-filter > .l-display-none')
const innerText = await page.evaluate(() => document.querySelector('.tile-search-filter > .l-display-none').innerText);
console.log(innerText)
} catch (error) {
console.log(error)
}
})()
}
index.html:
<html>
<head></head>
<body>
<script src="../js/scraper.js" type="text/javascript"></script>
</body>
</html>
The expected result should be this one in the console of Chrome:
But I'm getting this error instead:
What am I doing wrong?
I'm trying to do some web scraping with Puppeteer and I need to retrieve the value into a Website I'm building.
I have tried to load the Puppeteer file in the html file as if it was a JavaScript file but I keep getting an error. However, if I run it in a cmd window it works well.
Scraper.js:getPrice();
function getPrice() {
const puppeteer = require('puppeteer');
void (async () => {
try {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('http://example.com')
await page.setViewport({ width: 1920, height: 938 })
await page.waitForSelector('.m-hotel-info > .l-container > .l-header-section > .l-m-col-2 > .m-button')
await page.click('.m-hotel-info > .l-container > .l-header-section > .l-m-col-2 > .m-button')
await page.waitForSelector('.modal-content')
await page.click('.tile-hsearch-hws > .m-search-tabs > #edit-search-panel > .l-em-reset > .m-field-wrap > .l-xs-col-4 > .analytics-click')
await page.waitForNavigation();
await page.waitForSelector('.tile-search-filter > .l-display-none')
const innerText = await page.evaluate(() => document.querySelector('.tile-search-filter > .l-display-none').innerText);
console.log(innerText)
} catch (error) {
console.log(error)
}
})()
}
index.html:
<html>
<head></head>
<body>
<script src="../js/scraper.js" type="text/javascript"></script>
</body>
</html>
The expected result should be this one in the console of Chrome:
But I'm getting this error instead:
What am I doing wrong?
Share Improve this question edited May 29, 2020 at 2:34 Let Me Tink About It 16.1k21 gold badges108 silver badges216 bronze badges asked Feb 12, 2019 at 10:16 user10021033user10021033 5 |3 Answers
Reset to default 21EDIT: Since puppeteer removed support for puppeteer-web, I moved it out of the repo and tried to patch it a bit.
It does work with browser. The package is called puppeteer-web, specifically made for such cases.
But the main point is, there must be some instance of chrome running on some server. Only then you can connect to it.
You can use it later on in your web page to drive another browser instance through its WS Endpoint:
<script src="https://unpkg.com/puppeteer-web">
</script>
<script>
const browser = await puppeteer.connect({
browserWSEndpoint: `ws://0.0.0.0:8080`, // <-- connect to a server running somewhere
ignoreHTTPSErrors: true
});
const pagesCount = (await browser.pages()).length;
const browserWSEndpoint = await browser.wsEndpoint();
console.log({ browserWSEndpoint, pagesCount });
</script>
I had some fun with puppeteer and webpack,
- playground-react-puppeteer
- playground-electron-react-puppeteer-example
See these answers for full understanding of creating the server and more,
- Official link to puppeteer-web
- Puppeteer with docker
- Puppeteer with chrome extension
- Puppeteer with local wsEndpoint
Puppeteer runs exclusively on the server in Node.js. Allowing browser JS to run Node and an automated browser on a client machine would pose a major security risk--your browser code would be able to manipulate files on the client's file system and could do serious damage. Even if it was safe, installing Chromium on each client machine is heavy (hundreds of megabytes to download).
For the common case, rather than using puppeteer-web to allow the client to write Puppeteer code to control the browser, it's better to create an HTTP or websocket API that lets clients indirectly trigger Puppeteer code.
Reasons to prefer a REST API over puppeteer-web:
- better support for arbitrary client codebases--clients that aren't written in JS (desktop, command line and mobile apps, for example) can use the API just as easily as the browser can
- no dependency on puppeteer-web, which is now archived
- lower client-side complexity
- better control of client behavior--exposing Puppeteer's powerful capabilities fully to clients is less desirable than providing a subset of pre-approved functionality
- easier to integrate with other backend code and resources like the file system
- provides easy integration with an existing API as just another set of routes
- hiding Puppeteer as an implementation detail lets you switch to, say, Playwright in the future without the client code being affected.
Similarly, rather than exposing a mock fs
object to read and write files on the server, the normal approach is to expose REST API endpoints to accomplish these tasks with safe guardrails on the operations.
Since there are many use cases for Puppeteer in the context of an API (usually Express), it's hard to offer a general example, but here are a few case studies you can use as starting points:
- Puppeteer unable to run on Heroku
- Puppeteer doesn't close browser
- Parallelism of Puppeteer with Express Router Node JS. How to pass page between routes while maintaining concurrency
If you're trying to create an automation product for programmers, consider publishing an npm package. If you're trying to create a product for non-programmers, or which requires a GUI, perhaps publishing an Electron desktop app would be a good alternative to server-side automation.
Instead, use Puppeteer in the backend and make an API to interface your frontend with it if your main goal is to web scrape and get the data in the frontend.
本文标签: javascriptHow to run Puppeteer code in any web browserStack Overflow
版权声明:本文标题:javascript - How to run Puppeteer code in any web browser? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1738359993a2080552.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
waitForNavigation
- how do you expect it to work? You navigate to another page. There will be another page that will be unaware of this script. The reason why Node packages like Puppeteer exist is that some things cannot be achieved with a browser alone. – Estus Flask Commented Feb 12, 2019 at 10:37