admin管理员组

文章数量:1310272

Say I have

var string = 
"<h1>Header</h1>
<p>this is a small paragraph</p>
<ul>
    <li>list element 1.</li>
    <li>list element 2.</li>
    <li>list element 3. With a small update.</li>
</ul>"
//newlines for clarity only

How can I split this string, using javascript so that I get

var array = string.split(/*...something here*/)

array = [
"<h1>Header</h1>",
"<p>this is a small paragraph</p>",
"<ul><li>list element 1.</li><li>list element 2.</li><li>list element 3. With a small update.</li></ul>"
]

I only want to split the top html elements, not the children.

Say I have

var string = 
"<h1>Header</h1>
<p>this is a small paragraph</p>
<ul>
    <li>list element 1.</li>
    <li>list element 2.</li>
    <li>list element 3. With a small update.</li>
</ul>"
//newlines for clarity only

How can I split this string, using javascript so that I get

var array = string.split(/*...something here*/)

array = [
"<h1>Header</h1>",
"<p>this is a small paragraph</p>",
"<ul><li>list element 1.</li><li>list element 2.</li><li>list element 3. With a small update.</li></ul>"
]

I only want to split the top html elements, not the children.

Share Improve this question edited Apr 18, 2013 at 20:09 Alex Shesterov 27.6k13 gold badges88 silver badges108 bronze badges asked Apr 18, 2013 at 19:48 Eoin MurrayEoin Murray 1,9553 gold badges23 silver badges34 bronze badges
Add a ment  | 

3 Answers 3

Reset to default 3

You could do something like this:

var string = '<div><p></p></div><h1></h1>';
var elements = $(string).map(function() {
    return $('<div>').append(this).html();  // Basically `.outerHTML()`
});

And the result:

["<h1>Header</h1>", "<p>this is a small paragraph</p>", "<ul>    <li>list element 1.</li>    <li>list element 2.</li>    <li>list element 3. With a small update.</li></ul>"]

A performant solution ( http://jsperf./spliting-html ):

var splitter = document.createElement('div'),
  text = splitter.innerHTML = "<h1>Header</h1>\
<p>this is a small paragraph</p>\
<ul>\
    <li>list element 1.</li>\
    <li>list element 2.</li>\
    <li>list element 3. With a small update.</li>\
</ul>",
  parts = splitter.children,
  part = parts[0].innerHTML;

You can't do this with regular expressions. Your regular expression will fail if you have several nested elements of the same type, e.g.

<div>
  <div>
    <div>
    </div>
  </div>
</div>

This is due to the fact that regular expressions can only process regular languages, and HTML is a real context-free language (and context-free is "more plex" than regular).

See also: https://stackoverflow./a/1732454/2170192

But if you don't have nested elements of the same type, you may split your html-string by taking all matches returned by the following regular expression (which uses backlinks):

/<(\w+).*<\/\1\s*>/igsm
  • <(\w+) matches less-than-sign and several word-characters (letters, digits, underscores), while capturing the word-characters via parentheses (first capturing group).
  • .* matches contents of the element.
  • <\/ matches opening of the end-tag.
  • \1 is the backreference which matches exactly the sequence of symbols captured via the first capturing group.
  • \s*> matches optional whitespace and the greater-than sign.
  • igsm are modifiers: case-insensitive, global, dot-matches-all-symbols and multi-line.

本文标签: javascriptSplit a string of html into array of stringsplit by top level tagStack Overflow