admin管理员组

文章数量:1134247

Is it possible to open a local HTML file with headless Chrome using Puppeteer (without a web server)? I could only get it to work against a local server.

I found setContent() and goto() in the Puppeteer API documentation, but:

  1. page.goto: did not work with a local file or file://.
  2. page.setContent: is for an HTML string

Is it possible to open a local HTML file with headless Chrome using Puppeteer (without a web server)? I could only get it to work against a local server.

I found setContent() and goto() in the Puppeteer API documentation, but:

  1. page.goto: did not work with a local file or file://.
  2. page.setContent: is for an HTML string
Share Improve this question edited Nov 25, 2021 at 17:54 ted 14.7k10 gold badges68 silver badges113 bronze badges asked Dec 1, 2017 at 5:48 Anil NamdeAnil Namde 6,60811 gold badges65 silver badges101 bronze badges
Add a comment  | 

8 Answers 8

Reset to default 76

I just did a test locally (you can see I did this on windows) and puppeteer happily opened my local html file using page.goto and a full file url, and saved it as a pdf:

'use strict';

const puppeteer = require('puppeteer');    
(async() => {    
const browser = await puppeteer.launch();
const page = await browser.newPage();    
await page.goto('file://C:/Users/compoundeye/test.html');    
await page.pdf({
  path: 'test.pdf',
  format: 'A4',
  margin: {
        top: "20px",
        left: "20px",
        right: "20px",
        bottom: "20px"
  }    
});    
await browser.close();    
})();

If you need to use a relative path might want to look at this question about the use of relative file paths: File Uri Scheme and Relative Files

If file is on local, using setContent will be better than goto

var contentHtml = fs.readFileSync('C:/Users/compoundeye/test.html', 'utf8');
await page.setContent(contentHtml);

You can check performance between setContent and goto at here

Let's take a screenshot of an element from a local HTML file as an example.

import puppeteer from 'puppeteer';


(async () => {

    const browser = await puppeteer.launch();

    const page = await browser.newPage();
    
    //  __dirname is a global node variable that corresponds to the absolute 
    // path of the folder containing the currently executing file
    await page.goto(`file://${__dirname}/pages/test.html`);

    const element = await page.$('.myElement');

    if (element) {
        await element.screenshot({
            path: `./out/screenshot.png`,
            omitBackground: true,
        });
    }

    await browser.close();
})();

Navigation to local files only works if you also pass a referer of file://, otherwise security restrictions prevent this from succeeding.

Why not open the HTML file read the content, then "setContent"

You can use file-url to prepare the URL to pass to page.goto:

const fileUrl = require('file-url');
const puppeteer = require('puppeteer');    

const browser = await puppeteer.launch();
const page = await browser.newPage();   
 
await page.goto(fileUrl('file.html'));    
 
await browser.close();    

I open the file I wanted to load into the browser and copied the URL to make sure all the \'s where correct.

await page.goto(`file:///C:/pup_scrapper/testpage/TM.html`);

tl;dr there are caveats using page.setContent() in blank page

As noted by other answers, you can read the file using a Node API and then call page.setContent() for more flexibility over page.goto(). However, there are some limitations when the about:blank (default) page is displayed such as relative resources not loaded (more info here).

A workaround is to create an empty empty.html file, navigate to it and then call page.setContent():

// would typically load from a file
const html = '<!DOCTYPE html><title>Hello</title><p>World</p>';
await page.goto('file://empty.html', { waitUntil: 'load' });
await page.setContent(html, { waitUntil: 'networkidle0' });

If you want to load other resources locally which are not available using file://, you can take advantage of page.setRequestInterception():

import path from 'path';

let resources = [
    'style.css': {
        content: Buffer.from('p {color: navy;}'),
        mimetype: 'text/css'
    }
]

page.on('request', interceptedRequest => {
    const url = new URL(interceptedRequest.url());

    if (url.protocol === 'file:' && url.pathname !== 'empty.html') {
        const resourceName = path.basename(url.pathname); // Extract the file name
        const resource = resources[resourceName];
        if (resource) {
            interceptedRequest.respond({
                status: 200,
                contentType: resource.mimetype,
                body: resource.content,
            });
        } else {
            interceptedRequest.abort();
        }
    } else {
        interceptedRequest.continue();
    }
});

本文标签: javascriptOpening local HTML file using PuppeteerStack Overflow