admin管理员组文章数量:1313801
I'm attempting to scrape a <script>
tag from a set of webpages using Simple HTML Dom. At first, I was scraping it by providing the numerical order of the tag I needed:
$script = $html->find('script', 17); //The tag I need is typically the 18th <script> tag on the page
I've e to realize that the order differs depending on the page (and it's just not a scalable way of doing this since it could change at any time). How can I instead search for a keyword within the tag that I need and then pull back the full tag? For example, the tag I need always contains the string "PRODUCT_METADATA".
Thanks in advance for any ideas!
I'm attempting to scrape a <script>
tag from a set of webpages using Simple HTML Dom. At first, I was scraping it by providing the numerical order of the tag I needed:
$script = $html->find('script', 17); //The tag I need is typically the 18th <script> tag on the page
I've e to realize that the order differs depending on the page (and it's just not a scalable way of doing this since it could change at any time). How can I instead search for a keyword within the tag that I need and then pull back the full tag? For example, the tag I need always contains the string "PRODUCT_METADATA".
Thanks in advance for any ideas!
Share Improve this question asked Aug 3, 2015 at 18:46 user994585user994585 6714 gold badges14 silver badges28 bronze badges 1- Use Xpath with simpleXML aor DomDocument – splash58 Commented Aug 3, 2015 at 18:49
2 Answers
Reset to default 7I ended up using the below code to search all script tags for my keyword:
$scripts = $html->find('script');
foreach($scripts as $s) {
if(strpos($s->innertext, 'PRODUCT_METADATA') !== false) {
$script = $s;
}
}
It works, but for me I was trying to find a csrf token hidden in a script tag and at first couldn't get it to work, all a got out was NULL
.
My solution was use explode()
on the script s and very important remember ->innertext
else you can't get a string
.
I was lucky that the token was in doublequotes so it was easy to get it.
My final code looks like this:
$scripts = $html->find('script');
foreach($scripts as $s) {
if (strpos($s->innertext, 'csrf_token') !== false) {
$script_array = explode('"', $s->innertext);
$token = $script_array[1];
break;
}
}
本文标签: javascriptScraping ltscriptgt tag with certain keyword using Simple HTML Dom ParserStack Overflow
版权声明:本文标题:javascript - Scraping <script> tag with certain keyword using Simple HTML Dom Parser - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741955560a2406968.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论