admin管理员组文章数量:1401233
I'm currently working with an automation framework that is pulling a webpage down for analysis, which is then presented as a string for processing. The Rhino Javascript engine is available to assist in parsing the returned web page.
It seems that if the string (which is a plete webpage) can be loaded in a DOM representation, it would provide a very nice interface for parsing and analyzing content.
Using only Javascript, is this a possible and/or feasible concept?
Edit:
I'll depose the question for clarify: Say I have an string in javascript that contains html like such:
var $mywebpage = '<!DOCTYPE HTML PUB ...//snipped//... </body></html>';
is it possible/realistic to load it somehow into a dom object?
I'm currently working with an automation framework that is pulling a webpage down for analysis, which is then presented as a string for processing. The Rhino Javascript engine is available to assist in parsing the returned web page.
It seems that if the string (which is a plete webpage) can be loaded in a DOM representation, it would provide a very nice interface for parsing and analyzing content.
Using only Javascript, is this a possible and/or feasible concept?
Edit:
I'll depose the question for clarify: Say I have an string in javascript that contains html like such:
var $mywebpage = '<!DOCTYPE HTML PUB ...//snipped//... </body></html>';
is it possible/realistic to load it somehow into a dom object?
Share Improve this question edited Feb 4, 2011 at 22:31 xelco52 asked Feb 4, 2011 at 22:08 xelco52xelco52 5,3474 gold badges43 silver badges57 bronze badges 1-
If I understood right, you can append a html string to the body of a document
document.body.innerHTML="string"
– JCOC611 Commented Feb 4, 2011 at 22:10
3 Answers
Reset to default 1I'm accepting JonDavidJohn's answer as it was useful in solving my problem, thought including this additional answer for others that may view this in the future.
It appears that while Javascript allows the loading of html strings into a DOM element, DOM is not part of core ECMAScript, and as such is not available to scripts running under Rhino.
As a side note worth mentioning, a good alternative that was implemented in Rhino 1.6 is E4X. While not a DOM implementation, it does provide for conceptually similar capabilities.
If the document is XHTML, you can parse it with any XML parser. E4X would probably do the job nicely, as would the built-in Java XML parsing interfaces.
The env.js library is designed to emulate the browser environment under Rhino, but I believe your document also needs to be pliant XHTML:
http://ejohn/blog/bringing-the-browser-to-the-server/
http://www.envjs./
If it's HTML, however, it's more difficult, as browsers are designed to be extremely lenient in how markup is parsed. See here for a list of HTML parsers in Java:
http://java-source/open-source/html-parsers
This is not an easy problem to solve. People have gone so far as to embed the Mozilla Gecko engine in Java via JNI in order to use its parsing capabilities.
I would remend you look into the following pure-Java project:
http://lobobrowser/cobra.jsp
The goal of the Lobo project is to develop a pure-Java web browser. It's a pretty interesting project, and there's a lot there, but I believe you could use the parser standalone quite easily in your own application, as described in the following link:
http://lobobrowser/cobra/java-html-parser.jsp
if you have this variable that contains html, you can load it into a DOM object, for example, by id.
var mywebpage = '<!DOCTYPE HTML PUB ...//snipped//... </body></html>';
element = document.getElementById('dom-id'); //<-- element you are loading it into.
element.innerHTML = mywebpage;
本文标签: Load HTML string into DOM tree with JavascriptStack Overflow
版权声明:本文标题:Load HTML string into DOM tree with Javascript - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744253185a2597346.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论