admin管理员组文章数量:1405101
We host a set of "resource" pages - a collection of useful links for our users. For years we've had a script run daily - looping through each link and sending one php Guzzle HEAD request to make sure each page on the resource sites is active.
But over the past few years, I suspect as more and more sites adopt Cloudflare, sites are returning 403 codes to the HEAD request, and it's getting to the point where it's pretty useless to do this.
Is there a way to do this that isn't going to get this traffic treated as malicious? I don't need the content from the other sites... just simply to know if the pages are in good working order.
Here's the PHP code I'm using:
$client = new Client();
$request = $client->head($encoded_link);
$request->setOptions(['userAgent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36']);
$response = $request->send();
We host a set of "resource" pages - a collection of useful links for our users. For years we've had a script run daily - looping through each link and sending one php Guzzle HEAD request to make sure each page on the resource sites is active.
But over the past few years, I suspect as more and more sites adopt Cloudflare, sites are returning 403 codes to the HEAD request, and it's getting to the point where it's pretty useless to do this.
Is there a way to do this that isn't going to get this traffic treated as malicious? I don't need the content from the other sites... just simply to know if the pages are in good working order.
Here's the PHP code I'm using:
$client = new Client();
$request = $client->head($encoded_link);
$request->setOptions(['userAgent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36']);
$response = $request->send();
Share
Improve this question
asked Mar 8 at 16:20
Charlie ShehadiCharlie Shehadi
31 bronze badge
3
- I haven't tried this myself, but you could read the headers with a normal get request, and don't read the body. See, for instance: chriswhite.blog/coding/… You still need to work out what headers you get in which situation. – KIKO Software Commented Mar 8 at 16:41
- Agreed, just do a GET instead of a HEAD. It's not going to tell you that the site is "working" per se (i.e., a "down for maintenance" page is still going to return a success), but you're not getting that now and this will at least tell you if the server is actively responding. – Alex Howansky Commented Mar 8 at 16:53
- Yes, using GET instead of HEAD does a much better job. – Charlie Shehadi Commented Mar 10 at 15:51
1 Answer
Reset to default -1There are a number of points that should be able to help you, and different ways of proceeding depending on your needs.
If the number of resources you want to check is not too high you might use some monitoring services Tools like UptimeRobot, Pingdom.
For the most realistic approach, you may consider using a headless browser through PHP libraries like chrome-php, php-webdriver or Symfony Panther, which would interact with sites just like a real browser. It takes a bit of work at first, but it will be very effective.
Your script can be improved:
Use GET instead of HEAD requests
Many security systems are more suspicious of HEAD requests since they're commonly used by automated tools but rarely by real users. Switching to GET requests might help:
$request = $client->get($encoded_link);
Improve your user agent string
Your current user agent is somewhat outdated (Chrome 61). Use a more recent browser signature:
$options = [ 'headers' => [ 'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' ] ]; $request = $client->get($encoded_link, $options);
Add realistic headers
Include headers that typical browsers would send:
$options = [ 'headers' => [ 'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36', 'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8', 'Accept-Language' => 'en-US,en;q=0.9', 'Accept-Encoding' => 'gzip, deflate, br', 'Connection' => 'keep-alive', 'Upgrade-Insecure-Requests' => '1', 'Sec-Fetch-Dest' => 'document', 'Sec-Fetch-Mode' => 'navigate', 'Sec-Fetch-Site' => 'none', 'Sec-Fetch-User' => '?1' ] ];
本文标签: guzzleWhat is a legitimate way in PHP to test if a thirdparty site is workingStack Overflow
版权声明:本文标题:guzzle - What is a legitimate way in PHP to test if a third-party site is working? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744891665a2630818.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论