admin管理员组

文章数量:1336181

I have some JavaScript code running in node.js which controls puppeteer to automate tasks in a web browser.

This code gets a list of links on the page and outputs them to the console:

const links = await page.evaluate(() => { return [...document.querySelectorAll('a')].map(({href, innerText}) => ({href, innerText})); });
links.forEach(a => console.log(`<a href="${a.href}">${a.innerText.trim()}</a>`));

If I remove the map() like this:

const links = await page.evaluate(() => { return [...document.querySelectorAll('a')]; });
links.forEach(a => console.log(`<a href="${a.href}">${a.innerText.trim()}</a>`));

Then I get this error:

TypeError: Cannot read properties of undefined (reading 'trim')

Is there any way to work directly on the original array, without having to make a copy of the array using map()?

There are a couple of hundred properties on each <a> link, which I'd have to type out one at a time in the map() if I wanted to use many of them.


As an aside, is there any way to combine the 2 lines of code in to 1?

If I change it to this:

await page.evaluate(() => { return [...document.querySelectorAll('a')].map(({href, innerText}) => ({href, innerText})); })
    .forEach(a => console.log(`<a href="${a.href}">${a.innerText.trim()}</a>`));

Then I get this error:

TypeError: page.evaluate(...).forEach is not a function

I also found that it doesn't seem to be possible to do a console.log() whilst inside a page.evaluate() (I get no output). This is why I moved the forEach on to a 2nd line.

I have some JavaScript code running in node.js which controls puppeteer to automate tasks in a web browser.

This code gets a list of links on the page and outputs them to the console:

const links = await page.evaluate(() => { return [...document.querySelectorAll('a')].map(({href, innerText}) => ({href, innerText})); });
links.forEach(a => console.log(`<a href="${a.href}">${a.innerText.trim()}</a>`));

If I remove the map() like this:

const links = await page.evaluate(() => { return [...document.querySelectorAll('a')]; });
links.forEach(a => console.log(`<a href="${a.href}">${a.innerText.trim()}</a>`));

Then I get this error:

TypeError: Cannot read properties of undefined (reading 'trim')

Is there any way to work directly on the original array, without having to make a copy of the array using map()?

There are a couple of hundred properties on each <a> link, which I'd have to type out one at a time in the map() if I wanted to use many of them.


As an aside, is there any way to combine the 2 lines of code in to 1?

If I change it to this:

await page.evaluate(() => { return [...document.querySelectorAll('a')].map(({href, innerText}) => ({href, innerText})); })
    .forEach(a => console.log(`<a href="${a.href}">${a.innerText.trim()}</a>`));

Then I get this error:

TypeError: page.evaluate(...).forEach is not a function

I also found that it doesn't seem to be possible to do a console.log() whilst inside a page.evaluate() (I get no output). This is why I moved the forEach on to a 2nd line.

Share edited Nov 20, 2024 at 2:01 Danny Beckett asked Nov 19, 2024 at 20:02 Danny BeckettDanny Beckett 20.9k26 gold badges112 silver badges142 bronze badges 12
  • I can't think of a reason why you need the map(). – Barmar Commented Nov 19, 2024 at 20:05
  • 1 You also don't need to convert the result to an array with [...], since the collection returned by querySelectorAll() has a forEach() method. – Barmar Commented Nov 19, 2024 at 20:06
  • You can combine the two lines using .then(): page.evaluate(...).then(links => links.forEach(...)) – Barmar Commented Nov 19, 2024 at 20:07
  • @Barmar If I remove the [...] spread operator then I get TypeError: document.querySelectorAll(...).map is not a function – Danny Beckett Commented Nov 19, 2024 at 20:08
  • I said it supports forEach(), not map(). I was talking about the version without map(). – Barmar Commented Nov 19, 2024 at 20:08
 |  Show 7 more comments

3 Answers 3

Reset to default 2

IT goldman has correctly identified why trying to return an array of Nodes won't work--HTML elements aren't serializable.

It's possible to remove the map and operate on the original objects, but it will result in worse code. Mutating the original array of nodes to make them serializable isn't a good idea since it's risk to modify objects you don't own.

Avoid premature optimization. It's OK to copy by default and only switch to in-place modification once you encounter a bottleneck and have profiled and determined that in-place modification really does account for the performance issue--highly unlikely.

As far as the Puppeteer API goes, you can immediately simplify

await page.evaluate(() =>
  [...document.querySelectorAll("a")].map(...)
);

to

await page.$$eval("a", els => els.map(...));

The parameter els passed to the callback is a regular array, so .map is available without a spread.

I also found that it doesn't seem to be possible to do a console.log() whilst inside a page.evaluate() (I get no output). This is why I moved the forEach on to a 2nd line.

By default, the browser console output goes to your browser, not Node, because that's the environment the evaluate callback runs in.

You can forward the browser console to Node, but whether that's appropriate or not is unclear. You haven't provided much context for what you're doing here, or why you're mapping links back to formatted links with stripped attributes (you might want to use .outerHTML instead, depending on what you're actually trying to achieve).

I'd avoid smushing multiple lines onto one. Let two lines be two lines (or more)--just write clear code and use an autoformatter. await is not amenable to chaining or one-liners (by design!), so I'd avoid the (await foo()).property antipattern in favor of two lines.

Consider

const links = await page.$$eval("a", els =>
  els.map(a => `<a href="${a.href}">${a.textContent.trim()}</a>`)
);
links.forEach(console.log);

or

const links = await page.$$eval("a", els => els.map(el => el.outerHTML));
links.forEach(console.log);

Generally, prefer .textContent to .innerText.

Note also that it's possible for a links to not have hrefs, so you might want to adjust your selector to a[href].

The map is necessary to convert each HTMLElement into a serializable object { href, innerText } that can be passed in page.evaluate from the context of the browser (page) to the context of your node app.

If you want to work on the original array of elements, you can execute JavaScript on the context of the page inside the page.evaluate handler.

Access the properties inside the evaluate function, in the code that's running in the page, and send back only the results of your function back to the driver process. In your particular example, you'll want to simplify to

(await page.evaluate(() =>
   Array.from(document.querySelectorAll('a'), a => `<a href="${a.href}">${a.innerText.trim()}</a>`)
)).forEach(console.log);

本文标签: