admin管理员组文章数量:1353847
I'm trying to develop a little crawling/scraping project with Crawlee and Playwright on JavaScript/TypeScript, For each URL I feed the crawler it tries to scrap some data like this:
productDescriptionContainer = await page.locator(
'div[class="product-details__product-description"]'
),
region = await productDescriptionContainer
.locator("p")
.filter({ hasText: "Región:" })
.textContent(),
farm = await productDescriptionContainer
.locator("p")
.filter({ hasText: "Finca:" })
.textContent(),
The problem es when one of the locator is not found on the page. The crawler retries 3 times and pletely stops the scraping process for that specific URL. I would like to set those variables to some default value if the locator is not found and continue with the next.
I hope you can shed some light onto this because I've run out of ideas (catching the error, using ||, initialise the variables...). Thank you in advance.
I'm trying to develop a little crawling/scraping project with Crawlee and Playwright on JavaScript/TypeScript, For each URL I feed the crawler it tries to scrap some data like this:
productDescriptionContainer = await page.locator(
'div[class="product-details__product-description"]'
),
region = await productDescriptionContainer
.locator("p")
.filter({ hasText: "Región:" })
.textContent(),
farm = await productDescriptionContainer
.locator("p")
.filter({ hasText: "Finca:" })
.textContent(),
The problem es when one of the locator is not found on the page. The crawler retries 3 times and pletely stops the scraping process for that specific URL. I would like to set those variables to some default value if the locator is not found and continue with the next.
I hope you can shed some light onto this because I've run out of ideas (catching the error, using ||, initialise the variables...). Thank you in advance.
Share Improve this question asked Oct 1, 2023 at 17:53 EstratachuelaEstratachuela 111 gold badge1 silver badge3 bronze badges2 Answers
Reset to default 5Here is the solution I found to this.
Include a .catch in the await call, to avoid throw an error, so the code continues.
On your sample, it should be like this
region = await productDescriptionContainer
.locator("p")
.filter({ hasText: "Región:" })
.textContent(),
.catch((e) => console.log(e))
farm = await productDescriptionContainer
.locator("p")
.filter({ hasText: "Finca:" })
.textContent(),
.catch((e) => console.log(e))
just remove browser.close();
or set timout to 0 where you want to wait endless
const response = await page.waitForResponse('**/api/posts', { timeout: 0 });
本文标签:
版权声明:本文标题:javascript - Get the text content from a locator (Playwright and Crawlee) and default to an specific value if that locator is no 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743883328a2555601.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论