Strict HTML parsing in JavaScript - Stack Overflow

IT技术

更新时间：2025-04-151

admin管理员组
文章数量:1393083

On Google Chrome (Canary), it seems no string can make the DOM parser fail. I'm trying to parse some HTML, but if the HTML isn't pletely, 100%, valid, I want it to display an error. I've tried the obvious:

var newElement = document.createElement('div');
newElement.innerHTML = someMarkup; // Might fail on IE, never on Chrome.

I've also tried the method in this question. Doesn't fail for invalid markup, even the most invalid markup I can produce.

So, is there some way to parse HTML "strictly" in Google Chrome at least? I don't want to resort to tokenizing it myself or using an external validation utility. If there's no other alternative, a strict XML parser is fine, but certain elements don't require closing tags in HTML, and preferably those shouldn't fail.

On Google Chrome (Canary), it seems no string can make the DOM parser fail. I'm trying to parse some HTML, but if the HTML isn't pletely, 100%, valid, I want it to display an error. I've tried the obvious:

var newElement = document.createElement('div');
newElement.innerHTML = someMarkup; // Might fail on IE, never on Chrome.

I've also tried the method in this question. Doesn't fail for invalid markup, even the most invalid markup I can produce.

So, is there some way to parse HTML "strictly" in Google Chrome at least? I don't want to resort to tokenizing it myself or using an external validation utility. If there's no other alternative, a strict XML parser is fine, but certain elements don't require closing tags in HTML, and preferably those shouldn't fail.

Share edited May 23, 2017 at 12:20 CommunityBot 11 silver badge asked Feb 19, 2012 at 22:13 Ry-♦ 225k56 gold badges493 silver badges499 bronze badges

"strict" in JavaScript has a specific meaning, so I've edited the title of your question – T.J. Crowder Commented Feb 19, 2012 at 22:20
1 "...certain elements don't require closing tags in HTML..." Some elements don't require opening tags, either. – T.J. Crowder Commented Feb 19, 2012 at 22:21
tried it with HTML doctype strict? – powtac Commented Feb 19, 2012 at 22:21
@powtac: I'm trying to parse HTML fragments - no DTD. – Ry- ♦ Commented Feb 19, 2012 at 22:23
@T.J.Crowder: Okay - but the question remains :) – Ry- ♦ Commented Feb 19, 2012 at 22:24

| Show 3 more ments

1 Answer 1

Sorted by: Reset to default 7

Use the DOMParser to check a document in two steps:

Validate whether the document is XML-conforming, by parsing it as XML.
Parse the string as HTML. This requires a modification on the DOMParser.
Loop through each element, and check whether the DOM element is an instance of HTMLUnknownElement. For this purpose, getElementsByTagName('*') fits well.
(If you want to strictly parse the document, you have to recursively loop through each element, and remember whether the element is allowed to be placed at that location. Eg. <area> in <map>)

Demo: http://jsfiddle/q66Ep/1/

/* DOM parser for text/html, see https://stackoverflow./a/9251106/938089 */
;(function(DOMParser) {"use strict";var DOMParser_proto=DOMParser.prototype,real_parseFromString=DOMParser_proto.parseFromString;try{if((new DOMParser).parseFromString("", "text/html"))return;}catch(e){}DOMParser_proto.parseFromString=function(markup,type){if(/^\s*text\/html\s*(;|$)/i.test(type)){var doc=document.implementation.createHTMLDocument(""),doc_elt=doc.documentElement,first_elt;doc_elt.innerHTML=markup;first_elt=doc_elt.firstElementChild;if (doc_elt.childElementCount===1&&first_elt.localName.toLowerCase()==="html")doc.replaceChild(first_elt,doc_elt);return doc;}else{return real_parseFromString.apply(this, arguments);}};}(DOMParser));

/*
 * @description              Validate a HTML string
 * @param       String html  The HTML string to be validated 
 * @returns            null  If the string is not wellformed XML
 *                    false  If the string contains an unknown element
 *                     true  If the string satisfies both conditions
 */
function validateHTML(html) {
    var parser = new DOMParser()
      , d = parser.parseFromString('<?xml version="1.0"?>'+html,'text/xml')
      , allnodes;
    if (d.querySelector('parsererror')) {
        console.log('Not welformed HTML (XML)!');
        return null;
    } else {
        /* To use text/html, see https://stackoverflow./a/9251106/938089 */
        d = parser.parseFromString(html, 'text/html');
        allnodes = d.getElementsByTagName('*');
        for (var i=allnodes.length-1; i>=0; i--) {
            if (allnodes[i] instanceof HTMLUnknownElement) return false;
        }
    }
    return true; /* The document is syntactically correct, all tags are closed */
}

console.log(validateHTML('<div>'));  //  null, because of the missing close tag
console.log(validateHTML('<x></x>'));// false, because it's not a HTML element
console.log(validateHTML('<a></a>'));//  true, because the tag is closed,
                                     //       and the element is a HTML element

See revision 1 of this answer for an alternative to XML validation without the DOMParser.

Considerations

The current method pletely ignores the doctype, for validation.
This method returns null for <input type="text">, while it's valid HTML5 (because the tag is not closed).
Conformance is not checked.

本文标签： Strict HTML parsing in JavaScriptStack Overflow

版权声明：本文标题：Strict HTML parsing in JavaScript - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744652217a2617751.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

Strict HTML parsing in JavaScript - Stack Overflow

1 Answer 1

Demo: http://jsfiddle/q66Ep/1/

Considerations

更多相关文章

Strict HTML parsing in JavaScript - Stack Overflow

发表评论

推荐文章

javascript - Combining Filters with Pagination Angularjs - Stack Overflow

javascript - How to center align script - Stack Overflow

javascript - clearTimeout on Mouseover Event not clearing setTimeout from Mouseout Event - Stack Overflow

python - How to display a dash leaflet colorbar horizontally? - Stack Overflow

javascript - Time Complexity of HackerRank Diagonal Difference - Stack Overflow

热门文章

node.js - why javascript Speech Recognition api is not working without internet? - Stack Overflow

machine learning - LogVar layer of a VAE only returns zeros - Stack Overflow

javascript - JQuery. Send ajax query for each row in table - Stack Overflow

c# - Flurl.Http.Newtonsoft signed version - Stack Overflow

javascript - leaflet how to get latlng into html variable - Stack Overflow

javascript - SailsJS &amp; MongoDB Aggregation framework troubles with custom queries - Stack Overflow

Android - How to check whether Application is installed in Private Space? - Stack Overflow

plugins - Preserving existing functionality converting HTML to WordPress

python - Inconsistent API Data Size When Splitting a 4-Year Dataset into Various Time Chunks (Thingsbaord) - Stack Overflow

javascript - Close modal which has been created in another event function - Stack Overflow

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

arrays - Generate a non-repeating random number in JavaScript - Stack Overflow

custom post types - cpt not display inside nav menu

javascript - how to make a new line for react - Stack Overflow

When implementing a custom ISerializer for the Rebus bus, how do I get the name of the topic that the serialized message is inte

javascript equivalent of union in c - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

javascript - SailsJS & MongoDB Aggregation framework troubles with custom queries - Stack Overflow