admin管理员组文章数量:1390836
I am trying to use a loop to download a bunch of html pages and scrap inside data. But those pages have some javascript job runing when loading. So I am thinking using webclient may not be a good choice. But if I use webBrowser like below. it return empty html string after first call in the loop.
WebBrowser wb = new WebBrowser();
wb.ScrollBarsEnabled = false;
wb.ScriptErrorsSuppressed = true;
wb.Navigate(url);
while (wb.ReadyState != WebBrowserReadyState.Complete) { Application.DoEvents(); Thread.Sleep(1000); }
html = wb.Document.DomDocument.ToString();
I am trying to use a loop to download a bunch of html pages and scrap inside data. But those pages have some javascript job runing when loading. So I am thinking using webclient may not be a good choice. But if I use webBrowser like below. it return empty html string after first call in the loop.
WebBrowser wb = new WebBrowser();
wb.ScrollBarsEnabled = false;
wb.ScriptErrorsSuppressed = true;
wb.Navigate(url);
while (wb.ReadyState != WebBrowserReadyState.Complete) { Application.DoEvents(); Thread.Sleep(1000); }
html = wb.Document.DomDocument.ToString();
Share
Improve this question
asked Feb 12, 2016 at 14:29
Mike LongMike Long
3634 silver badges17 bronze badges
1
- What if using the WebClient downloadstring method? does it help? – User2012384 Commented Feb 12, 2016 at 14:34
1 Answer
Reset to default 5Your are correct that WebClient & all of the other HTTP client interfaces will pletely ignore JavaScript; none of them are Browsers after all.
You want:
var html = wb.Document.GetElementsByTagName("HTML")[0].OuterHtml;
Note that if you load via a WebBrowser you don't need to scrape the raw markup; you can use DOM methods like GetElementById/TagName
and so on.
The while loop is very VBScript, there is a DocumentCompleted
event you should wire your code into.
private void Whatever()
{
WebBrowser wb = new WebBrowser();
wb.DocumentCompleted += Wb_DocumentCompleted;
wb.ScriptErrorsSuppressed = true;
wb.Navigate("http://stackoverflow.");
}
private void Wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var wb = (WebBrowser)sender;
var html = wb.Document.GetElementsByTagName("HTML")[0].OuterHtml;
var domd = wb.Document.GetElementById("copyright").InnerText;
/* ... */
}
本文标签: javascriptc download html string after page loading is finishedStack Overflow
版权声明:本文标题:javascript - c# download html string after page loading is finished - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744753564a2623325.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论