admin管理员组

文章数量:1122861

网站源码抓取css.html.jss,javascript

我正在尝试使用WWW :: Mechanize :: Chrome浏览器下载css / js文件.是的,还有其他获取文件的方法.但是我的要求是使用WWW :: Mechanize :: Chrome来完成.我想知道是否有可能.

我可以对CSS或JS文件执行$mech-> get($url).然后,它显示在浏览器窗口中,然后我可以使用$mech-> content获得该内容.问题在于HTML实体进行了编码和解码,导致生成的文件与原始文件不同(我对此进行了测试).这是js文件的问题.之后它们无法正常运行.

您可以运行此测试脚本来查看已编码的文件.

use strict;

use warnings;

use WWW::Mechanize::Chrome;

my $mech = WWW::Mechanize::Chrome->new();

$mech->get('.js');

my $content = $mech->content;

use Data::Dumper qw(Dumper);

print Dumper $content;

我想知道是否有某种解决方法可以直接从服务器获取这些文件.同样,必须使用WWW :: Mechanize :: Chrome.

解决方法:

如果没有其他问题,您可以注入一个脚本来为您下载文件.

以下内容使用Selenium :: Chrome演示了此方法,但是该方法可以适用于WWW :: Mechanize :: Chrome.

use strict;

use warnings qw( all );

use FindBin qw( $RealBin );

use MIME::Base64 qw( decode_base64 );

use Selenium::Chrome qw( );

use Time::HiRes qw( sleep );

use Sub::ScopeFinalizer qw( scope_finalizer );

# nf = Non-fatal.

sub nf_find_element {

my $web_driver = shift;

my $node;

if (!eval {

$node = $web_driver->find_element(@_);

return 1; # No exception.

}) {

return undef if $@ =~ /Unable to locate element|An element could not be located on the page using the given search parameters/;

die($@);

}

return $node;

}

sub nf_find_elements {

my $web_driver = shift;

my $nodes;

if (!eval {

$nodes = $web_driver->find_elements(@_);

return 1; # No exception.

}) {

return undef if $@ =~ /Unable to locate element|An element could not be located on the page using the given search parameters/;

die($@);

}

return wantarray ? @$nodes : $nodes;

}

sub nf_find_child_element {

my $web_driver = shift;

my $node;

if (!eval {

$node = $web_driver->find_child_element(@_);

return 1; # No exception.

}) {

return undef if $@ =~ /Unable to locate element|An element could not be located on the page using the given search parameters/;

die($@);

}

return $node;

}

sub nf_find_child_elements {

my $web_driver = shift;

my $nodes;

if (!eval {

$nodes = $web_driver->find_child_elements(@_);

return 1; # No exception.

}) {

return undef if $@ =~ /Unable to locate element|An element could not be located on the page using the given search parameters/;

die($@);

}

return wantarray ? @$nodes : $nodes;

}

# Warning: This clears the log.

sub has_js_failed {

my ($web_driver) = @_;

my $log = $web_driver->get_log('browser');

return 0+grep { no warnings qw( uninitialized ); $_->{level} eq 'SEVERE' && $_->{source} eq 'javascript' } @$log;

}

{

my $js = <

var array_buffer_to_base64 = function(buf) {

let binary = '';

let bytes = new Uint8Array(buf);

for (let byte of bytes) {

binary += String.fromCharCode(byte);

}

return btoa(binary);

};

var set_response = function(code, msg) {

let code_node = document.createElement('input');

code_node.setAttribute('type', 'hidden');

code_node.setAttribute('name', 'code');

code_node.setAttribute('value', code);

let msg_node = document.createElement('input');

msg_node.setAttribute('type', 'hidden');

msg_node.setAttribute('name', 'msg');

msg_node.setAttribute('value', msg);

let form_node = document.createElement('form');

form_node.setAttribute('id', 'exit');

form_node.appendChild(code_node);

form_node.appendChild(msg_node);

document.body.appendChild(form_node);

};

var request = function(url) {

fetch(url)

.then(

response => {

if (!response.ok)

throw new Error("HTTP error: " + response.status);

return response.arrayBuffer();

}

)

.then(

buffer => set_response("success", array_buffer_to_base64(buffer)),

reason => set_response("error", reason),

);

};

request(...arguments);

__EOS__

my $web_driver;

my $guard = scope_finalizer {

if ($web_driver) {

$web_driver->shutdown_binary();

$web_driver = undef;

}

};

$web_driver = Selenium::Chrome->new(

binary => "$RealBin/chromedriver.exe",

);

$web_driver->get('/');

$web_driver->execute_script($js, '.js');

my $exit_form_node;

while (1) {

if (has_js_failed($web_driver)) {

die("JavaScript error detected.\n");

}

$exit_form_node = nf_find_element($web_driver, '/html/body/form[@id="exit"]')

and last;

sleep(0.250);

}

my $code = nf_find_child_element($web_driver, $exit_form_node, 'input[@name="code"]')->get_value();

my $msg = nf_find_child_element($web_driver, $exit_form_node, 'input[@name="msg"]')->get_value();

if (!defined($code) || $code ne 'success') {

$msg ||= "Unknown error";

die("$msg\n");

}

my $doc = decode_base64($msg);

binmode STDOUT;

print $doc;

}

可能希望在轮询循环上添加一个超时,因此如果出现问题,它不会永远等待.

标签:css,javascript,perl,www-mechanize-chrome

来源: .html

本文标签: 网站源码抓取csshtmljssjavascript