admin管理员组文章数量:1122861
网站源码抓取css.html.jss,javascript
我正在尝试使用WWW :: Mechanize :: Chrome浏览器下载css / js文件.是的,还有其他获取文件的方法.但是我的要求是使用WWW :: Mechanize :: Chrome来完成.我想知道是否有可能.
我可以对CSS或JS文件执行$mech-> get($url).然后,它显示在浏览器窗口中,然后我可以使用$mech-> content获得该内容.问题在于HTML实体进行了编码和解码,导致生成的文件与原始文件不同(我对此进行了测试).这是js文件的问题.之后它们无法正常运行.
您可以运行此测试脚本来查看已编码的文件.
use strict;
use warnings;
use WWW::Mechanize::Chrome;
my $mech = WWW::Mechanize::Chrome->new();
$mech->get('.js');
my $content = $mech->content;
use Data::Dumper qw(Dumper);
print Dumper $content;
我想知道是否有某种解决方法可以直接从服务器获取这些文件.同样,必须使用WWW :: Mechanize :: Chrome.
解决方法:
如果没有其他问题,您可以注入一个脚本来为您下载文件.
以下内容使用Selenium :: Chrome演示了此方法,但是该方法可以适用于WWW :: Mechanize :: Chrome.
use strict;
use warnings qw( all );
use FindBin qw( $RealBin );
use MIME::Base64 qw( decode_base64 );
use Selenium::Chrome qw( );
use Time::HiRes qw( sleep );
use Sub::ScopeFinalizer qw( scope_finalizer );
# nf = Non-fatal.
sub nf_find_element {
my $web_driver = shift;
my $node;
if (!eval {
$node = $web_driver->find_element(@_);
return 1; # No exception.
}) {
return undef if $@ =~ /Unable to locate element|An element could not be located on the page using the given search parameters/;
die($@);
}
return $node;
}
sub nf_find_elements {
my $web_driver = shift;
my $nodes;
if (!eval {
$nodes = $web_driver->find_elements(@_);
return 1; # No exception.
}) {
return undef if $@ =~ /Unable to locate element|An element could not be located on the page using the given search parameters/;
die($@);
}
return wantarray ? @$nodes : $nodes;
}
sub nf_find_child_element {
my $web_driver = shift;
my $node;
if (!eval {
$node = $web_driver->find_child_element(@_);
return 1; # No exception.
}) {
return undef if $@ =~ /Unable to locate element|An element could not be located on the page using the given search parameters/;
die($@);
}
return $node;
}
sub nf_find_child_elements {
my $web_driver = shift;
my $nodes;
if (!eval {
$nodes = $web_driver->find_child_elements(@_);
return 1; # No exception.
}) {
return undef if $@ =~ /Unable to locate element|An element could not be located on the page using the given search parameters/;
die($@);
}
return wantarray ? @$nodes : $nodes;
}
# Warning: This clears the log.
sub has_js_failed {
my ($web_driver) = @_;
my $log = $web_driver->get_log('browser');
return 0+grep { no warnings qw( uninitialized ); $_->{level} eq 'SEVERE' && $_->{source} eq 'javascript' } @$log;
}
{
my $js = <
var array_buffer_to_base64 = function(buf) {
let binary = '';
let bytes = new Uint8Array(buf);
for (let byte of bytes) {
binary += String.fromCharCode(byte);
}
return btoa(binary);
};
var set_response = function(code, msg) {
let code_node = document.createElement('input');
code_node.setAttribute('type', 'hidden');
code_node.setAttribute('name', 'code');
code_node.setAttribute('value', code);
let msg_node = document.createElement('input');
msg_node.setAttribute('type', 'hidden');
msg_node.setAttribute('name', 'msg');
msg_node.setAttribute('value', msg);
let form_node = document.createElement('form');
form_node.setAttribute('id', 'exit');
form_node.appendChild(code_node);
form_node.appendChild(msg_node);
document.body.appendChild(form_node);
};
var request = function(url) {
fetch(url)
.then(
response => {
if (!response.ok)
throw new Error("HTTP error: " + response.status);
return response.arrayBuffer();
}
)
.then(
buffer => set_response("success", array_buffer_to_base64(buffer)),
reason => set_response("error", reason),
);
};
request(...arguments);
__EOS__
my $web_driver;
my $guard = scope_finalizer {
if ($web_driver) {
$web_driver->shutdown_binary();
$web_driver = undef;
}
};
$web_driver = Selenium::Chrome->new(
binary => "$RealBin/chromedriver.exe",
);
$web_driver->get('/');
$web_driver->execute_script($js, '.js');
my $exit_form_node;
while (1) {
if (has_js_failed($web_driver)) {
die("JavaScript error detected.\n");
}
$exit_form_node = nf_find_element($web_driver, '/html/body/form[@id="exit"]')
and last;
sleep(0.250);
}
my $code = nf_find_child_element($web_driver, $exit_form_node, 'input[@name="code"]')->get_value();
my $msg = nf_find_child_element($web_driver, $exit_form_node, 'input[@name="msg"]')->get_value();
if (!defined($code) || $code ne 'success') {
$msg ||= "Unknown error";
die("$msg\n");
}
my $doc = decode_base64($msg);
binmode STDOUT;
print $doc;
}
可能希望在轮询循环上添加一个超时,因此如果出现问题,它不会永远等待.
标签:css,javascript,perl,www-mechanize-chrome
来源: .html
本文标签: 网站源码抓取csshtmljssjavascript
版权声明:本文标题:网站源码抓取css.html.jss,javascript 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/biancheng/1701428889a400503.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论