admin管理员组

文章数量:1323529

I need to parse an HTML string and remove all the elements which contain only empty children.

Example:

<P ALIGN="left"><FONT FACE="Arial" SIZE="12" COLOR="#000000" LETTERSPACING="0" KERNING="1"><B></B></FONT></P>

contains no information and must be replaced with </br>

I wrote a regex like this:

<\w+\b[^>]*>(<\w+\b[^>]*>\s*</\w*\s*>)*</\w*\s*>

but the problem is that it's catching only 2 levels of the three. In the abobe example, the <p> element (the outer-most one) is not selected.

Can you help me fix this regex?

I need to parse an HTML string and remove all the elements which contain only empty children.

Example:

<P ALIGN="left"><FONT FACE="Arial" SIZE="12" COLOR="#000000" LETTERSPACING="0" KERNING="1"><B></B></FONT></P>

contains no information and must be replaced with </br>

I wrote a regex like this:

<\w+\b[^>]*>(<\w+\b[^>]*>\s*</\w*\s*>)*</\w*\s*>

but the problem is that it's catching only 2 levels of the three. In the abobe example, the <p> element (the outer-most one) is not selected.

Can you help me fix this regex?

Share Improve this question asked Nov 13, 2013 at 10:26 Cristian HoldunuCristian Holdunu 1,9182 gold badges18 silver badges43 bronze badges 4
  • 1 brace yourself for downvotes on regex+HTML question – hjpotter92 Commented Nov 13, 2013 at 10:29
  • 3 The font element has been deprecated since HTML3 so why are you still using it? – user2417483 Commented Nov 13, 2013 at 10:30
  • stackoverflow./q/3129738/612202 You should prefer the answer with more votes. – dan-lee Commented Nov 13, 2013 at 10:30
  • this is the point, I want to get rid of it. I have an older database from where I take this info. There are some notes with formatting saved as text and I want to get rid off useless elements and of font elements. I replaced them with spans – Cristian Holdunu Commented Nov 13, 2013 at 10:50
Add a ment  | 

3 Answers 3

Reset to default 5

This regex seems to work:

/(<(?!\/)[^>]+>)+(<\/[^>]+>)+/

See a live demo with your example.

Use jQuery and parse all children. For each child you have to check if .html() is empty. If yes -> delete the current element (or the parent if you want) with .remove().

Do for each string:

var appended = $('.yourparent').append('YOUR HTML STRING');

appended.children().each(function () 
{
    if(this.html() === '')
    {
        this.parent().remove(); 
    }
});

This will add the items first and delete, if there are empty children.

please try this:

function removeEmtpyElements(str, iterations){
    var re = /<([A-z]+)([^>^/]*)>\s*<\/\1>/gim;
    var subst = '';
    
    for(var i = 0; i < iterations; i++){
        str = str.replace(re, subst);
    }
    
    return str;
}

本文标签: javascriptRegex to remove empty html tagsthat contains only empty childrenStack Overflow