admin管理员组文章数量:1327555
I'm using a Python script (Mechanize) to login to a proxy portal. I can login successfully. I can check that from read()
function.
However, after successful login, I couldn't access the blocked sites by the proxy. So I checked the HTTP headers from FF and found that Connection: Keep-alive
. But from mechanize
, I found Connection: close
. I tried to imitate the HTTP header exactly as from FF using browser.addheaders
but this didn't work as well :(
After deep digging, I found a couple of suggestions that the server closes the connection because mechanize can't totally emulate a browser as the webpage contains JS which is not supported by mechanize
So, is there a way to emulate (make the server feel) that mechanize is a browser (supports JS), even though it doesn't?
BTW, I don't need JS, I can login successfully as I mentioned above. And please don't suggest PhantomJS. I need a Python package to do the job not a headless browser.
Update:
FireFox Headers:
GET xxx HTTP/1.1
Host: xxx
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: DSLastAccess=1454082611
Connection: keep-alive
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Set-Cookie: DSEPAgentInstalled=; path=/; expires=Tue, 31-Jan-2006 16:18:32 GMT; secure
Date: Fri, 29 Jan 2016 16:18:32 GMT
x-frame-options: SAMEORIGIN
Connection: Keep-Alive
Keep-Alive: timeout=15
Pragma: no-cache
Cache-Control: no-store
Expires: -1
Transfer-Encoding: chunked
Mechanize addheaders:
browser.addheaders = [('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),\
('Accept-Language', 'en-US,en;q=0.5'),\
('Accept-Encoding', 'gzip, deflate'),\
('Host', 'xxx'),\
('Connection','keep-alive'),\
('Cookie', 'DSLastAccess=1454082611'),\
('User-agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0')]
Mechanize Headers
send: 'CONNECT xxx:443 HTTP/1.0\r\n'
send: '\r\n'
send: 'GET xxx.cgi HTTP/1.1\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nHost: xxx\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0\r\nConnection: close\r\nCookie: DSLastAccess=1454082611\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Content-Type: text/html; charset=utf-8
header: Set-Cookie: DSEPAgentInstalled=; path=/; expires=Tue, 31-Jan-2006 16:31:03 GMT; secure
header: Date: Fri, 29 Jan 2016 16:31:03 GMT
header: x-frame-options: SAMEORIGIN
header: Connection: close
header: Pragma: no-cache
header: Cache-Control: no-store
header: Expires: -1
Another thing that drives me crazy, that the sent Connection
from mechanize
is : close
even though I've set it as keep-alive
as you can see in addheaders
I'm using a Python script (Mechanize) to login to a proxy portal. I can login successfully. I can check that from read()
function.
However, after successful login, I couldn't access the blocked sites by the proxy. So I checked the HTTP headers from FF and found that Connection: Keep-alive
. But from mechanize
, I found Connection: close
. I tried to imitate the HTTP header exactly as from FF using browser.addheaders
but this didn't work as well :(
After deep digging, I found a couple of suggestions that the server closes the connection because mechanize can't totally emulate a browser as the webpage contains JS which is not supported by mechanize
So, is there a way to emulate (make the server feel) that mechanize is a browser (supports JS), even though it doesn't?
BTW, I don't need JS, I can login successfully as I mentioned above. And please don't suggest PhantomJS. I need a Python package to do the job not a headless browser.
Update:
FireFox Headers:
GET xxx HTTP/1.1
Host: xxx
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: DSLastAccess=1454082611
Connection: keep-alive
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Set-Cookie: DSEPAgentInstalled=; path=/; expires=Tue, 31-Jan-2006 16:18:32 GMT; secure
Date: Fri, 29 Jan 2016 16:18:32 GMT
x-frame-options: SAMEORIGIN
Connection: Keep-Alive
Keep-Alive: timeout=15
Pragma: no-cache
Cache-Control: no-store
Expires: -1
Transfer-Encoding: chunked
Mechanize addheaders:
browser.addheaders = [('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),\
('Accept-Language', 'en-US,en;q=0.5'),\
('Accept-Encoding', 'gzip, deflate'),\
('Host', 'xxx'),\
('Connection','keep-alive'),\
('Cookie', 'DSLastAccess=1454082611'),\
('User-agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0')]
Mechanize Headers
send: 'CONNECT xxx:443 HTTP/1.0\r\n'
send: '\r\n'
send: 'GET xxx.cgi HTTP/1.1\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nHost: xxx\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0\r\nConnection: close\r\nCookie: DSLastAccess=1454082611\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Content-Type: text/html; charset=utf-8
header: Set-Cookie: DSEPAgentInstalled=; path=/; expires=Tue, 31-Jan-2006 16:31:03 GMT; secure
header: Date: Fri, 29 Jan 2016 16:31:03 GMT
header: x-frame-options: SAMEORIGIN
header: Connection: close
header: Pragma: no-cache
header: Cache-Control: no-store
header: Expires: -1
Another thing that drives me crazy, that the sent Connection
from mechanize
is : close
even though I've set it as keep-alive
as you can see in addheaders
- 1 There is nothing in HTTP headers about JS. Keep-alive is probably not relevant here. You should probably post the HTTP headers (both request and response) in both working and not working version. Edit out the session cookie or whatever, but check if it was there. – Sergey Salnikov Commented Jan 27, 2016 at 17:41
-
@SergeySalnikov, thanks for the reply. I'm not saying that there is something in HTTP headers about JS. I'm just saying that from the HTTP headers I can tell that the server closes the connection. And that's, probably, because the server can tell that
mechanize
is not a browser. And it can tell because it doesn't see support for JS. So it recognizesmechanize
as NOT a browser – user5174680 Commented Jan 28, 2016 at 13:32 - Do you mean the server closes the connection without any reply? – Sergey Salnikov Commented Jan 28, 2016 at 15:27
-
@SergeySalnikov, no of course it replis. I mean when I check the server HTTP header it has
Connection: close
– user5174680 Commented Jan 29, 2016 at 15:15 - As far as I know, there's no way a HTTP server detect client javascript support. The most mon way to detect client is by User-Agent header property. It would be great if you post request/response headers, as suggested by @SergeySalnikov – Miguel A. Baldi Hörlle Commented Jan 29, 2016 at 15:33
1 Answer
Reset to default 7 +50For linux
Foremost, I know some people dont just wanta suggestion to switch to another option. However, I believe that if you want to access the page entirely after logging in, (which currently fails due to no javascript support) you should look into using Selenium.
You can grab it with a quick sudo pip install selenium
.
Accessing a webpage is as easy as declaring your browser, then telling your browser to go to the desired webpage. Here, i have attached a basic sample to make your browser go to a webpage, the page im using relies heavily on javascript:
import selenium
from selenium import webdriver
try:
browser = webdriver.Firefox()
browser.get('mikekus.')
except KeyboardInterrupt:
browser.quit()
This works, because selenium actually opens a browser. However, if you wish to hide the browser, so you dont have to see it and have it in your taskbar.
I remend the following setup using pyvirtualdisplay which will hide the browser using visible=0
. It is worth noting pyvirtualdisplay is a wrapper, for Xvfb and as such requires you install it as well. You can get it with sudo apt-get install xvfb
:
import selenium
from selenium import webdriver
from pyvirtualdisplay import Display
try:
display = Display(visible=0, size=(800, 600))
display.start()
browser = webdriver.Firefox()
browser.get('mikekus.')
except KeyboardInterrupt:
browser.quit()
display.stop()
I will leave the filling in login forms, etc. To you, as its quite simple if your read the docs, as everyone should. Navigating With Selenium
Granted, in your situation you are trying to access the proxy, then access another site. This method implies you would direct the proxy to the webpage from the proxys page itself, through accessing fields on the page. Im sure with a bit of time you could continue navigating to multiple pages and page elements, again with a bit of research.
I hope this helps. Good luck.
本文标签: pythonHow to emulate a browser with JavaScript support via MechanizeStack Overflow
版权声明:本文标题:python - How to emulate a browser with JavaScript support via Mechanize? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742173249a2427106.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论