admin管理员组文章数量:1344465
I want to extract js script in script tag.
this the script tag :
<script>
$(document).ready(function(){
$("#div1").click(function(){
$("#divcontent").load("ajax.content.php?p=0&cat=1");
});
$("#div2").click(function(){
$("#divcontent").load("ajax.content.php?p=1&cat=1");
});
});
</script>
I have an array of ids like ['div1', 'div2']
, and I need to extract url link inside it :
so if i call a function :
getUrlOf('div1');
it will return ajax.content.php?p=0&cat=1
I want to extract js script in script tag.
this the script tag :
<script>
$(document).ready(function(){
$("#div1").click(function(){
$("#divcontent").load("ajax.content.php?p=0&cat=1");
});
$("#div2").click(function(){
$("#divcontent").load("ajax.content.php?p=1&cat=1");
});
});
</script>
I have an array of ids like ['div1', 'div2']
, and I need to extract url link inside it :
so if i call a function :
getUrlOf('div1');
it will return ajax.content.php?p=0&cat=1
- Whatever you're trying to do, this seems to be the wrong way of going about it. However the PHP file generates this inline code, you should be getting the links the same way the PHP source does, not by parsing inline JavaScript source to obtain hard-coded string values within event handlers. – Patrick Roberts Commented Dec 18, 2018 at 19:09
2 Answers
Reset to default 6If you're using a newer version of cheerio (1.0.0-rc.2), you'll need to use .html()
instead of .text()
const cheerio = require('cheerio');
const $ = cheerio.load('<script>script one</script> <script> script two</script>');
// For the first script tag
console.log($('script').html());
// For all script tags
console.log($('script').map((idx, el) => $(el).html()).toArray());
https://github./cheeriojs/cheerio/issues/1050
With Cheerio, it is very easy to get the text of the script tag:
const cheerio = require('cheerio');
const $ = cheerio.load("the HTML the webpage you are scraping");
// If there's only one <script>
console.log($('script').text());
// If there's multiple scripts
$('script').each((idx, elem) => console.log(elem.text()));
From here, you're really just asking "how do I parse a generic block of javascript and extract a list of links". I agree with Patrick above in the ments, you probably shouldn't. Can you craft a regex that will let you find each link in the script and deduce the page it links to? Yes. But very likely, if anything about this page changes, your script will immediately break - the author of the page might switch to inline <a>
tags, refactor the code, use live events, etc.
Just be aware that relying on the exact contents of this script tag will make your application very brittle -- even more brittle than page scraping generally is.
Here's an example of a loose but effective regex:
let html = "ining html";
let regex = /\$\("(#.+?)"\)\.click(?:.|\n)+?\.load\("(.+?)"/;
let match;
while (match = regex.exec(html)) {
console.log(match[1] + ': ' + match[2]);
}
In case you are new to regex: this expression contains two capture groups, in parens (the first is the div id, the second is the link text), as well as a non-capturing group in the middle, which exists only to make sure the regex will continue through a line break. I say it's "loose" because the match it is looking for looks like this:
- $("
***
").click***ignored chars***
.load("***
"
So, depending on how much javascript there is and how similar it is, you might have to tighten it up to avoid false positives.
本文标签: javascriptcheerio find a text in a script tagStack Overflow
版权声明:本文标题:javascript - cheerio find a text in a script tag - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743754990a2533345.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论