admin管理员组

文章数量:1296245

I want to start a chromium browser instant headless, do some automated operations, and then turn it visible before doing the rest of the stuff.

Is this possible to do using Puppeteer, and if it is, can you tell me how? And if it is not, is there any other framework or library for browser automation that can do this?

So far I've tried the following but it didn't work.

const browser = await puppeteer.launch({'headless': false});
browser.headless = true;
const page = await browser.newPage();
await page.goto('', {waitUntil: 'networkidle2'});
await page.pdf({path: 'hn.pdf', format: 'A4'});

I want to start a chromium browser instant headless, do some automated operations, and then turn it visible before doing the rest of the stuff.

Is this possible to do using Puppeteer, and if it is, can you tell me how? And if it is not, is there any other framework or library for browser automation that can do this?

So far I've tried the following but it didn't work.

const browser = await puppeteer.launch({'headless': false});
browser.headless = true;
const page = await browser.newPage();
await page.goto('https://news.ybinator.', {waitUntil: 'networkidle2'});
await page.pdf({path: 'hn.pdf', format: 'A4'});
Share Improve this question edited Apr 23, 2019 at 16:07 chazsolo 8,4941 gold badge24 silver badges45 bronze badges asked Apr 23, 2019 at 10:31 Sumit GhoshSumit Ghosh 1,2931 gold badge15 silver badges34 bronze badges 2
  • You can't, headless is a mand argument added when you lunch Chromium. – hardkoded Commented Apr 23, 2019 at 12:07
  • Worth checking out is a useful answer from another thread that offers code to do this. – ggorlen Commented Sep 24, 2022 at 2:31
Add a ment  | 

2 Answers 2

Reset to default 10

Short answer: It's not possible

Chrome only allows to either start the browser in headless or non-headless mode. You have to specify it when you launch the browser and it is not possible to switch during runtime.

What is possible, is to launch a second browser and reuse cookies (and any other data) from the first browser.

Long answer

You would assume that you could just reuse the data directory when calling puppeteer.launch, but this is currently not possible due to multiple bugs (#1268, #1270 in the puppeteer repo).

So the best approach is to save any cookies or local storage data that you need to share between the browser instances and restore the data when you launch the browser. You then visit the website a second time. Be aware that any state the website has in terms of JavaScript variable, will be lost when you recrawl the page.

Process

Summing up, the whole process should look like this (or vice versa for headless to headfull):

  • Crawl in non-headless mode until you want to switch mode
  • Serialize cookies
  • Launch or reuse second browser (in headless mode)
  • Restore cookies
  • Revisit page
  • Continue crawling

As mentioned, this isn't currently possible since the headless switch occurs via Chromium launch flags.

I usually do this with userDataDir, which the Chromium docs describe as follows:

The user data directory contains profile data such as history, bookmarks, and cookies, as well as other per-installation local state.

Here's a simple example. This launches a browser headlessly, sets a local storage value on an arbitrary page, closes the browser, re-opens it headfully, retrieves the local storage value and prints it.

const puppeteer = require("puppeteer"); // ^18.0.4

const url = "https://www.example.";
const opts = {userDataDir: "./data"};

let browser;
(async () => {
  {
    browser = await puppeteer.launch({...opts, headless: true});
    const [page] = await browser.pages();
    await page.goto(url, {waitUntil: "domcontentloaded"});
    await page.evaluate(() => localStorage.setItem("hello", "world"));
    await browser.close();
  }
  {
    browser = await puppeteer.launch({...opts, headless: false});
    const [page] = await browser.pages();
    await page.goto(url, {waitUntil: "domcontentloaded"});
    const result = await page.evaluate(() => localStorage.getItem("hello"));
    console.log(result); // => world
  }
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close())
;

Change const opts = {userDataDir: "./data"}; to const opts = {}; and you'll see null print instead of world; the user data doesn't persist.

The answer from a few years ago mentions issues with userDataDir and suggests a cookies solution. That's fine, but I haven't had any issues with userDataDir so either they've been resolved on the Puppeteer end or my use cases haven't triggered the issues.

There's a useful-looking answer from a reputable source in How to turn headless on after launch? but I haven't had a chance to try it yet.

本文标签: