admin管理员组文章数量:1416051
I am still not able to use regular expressions by heart, thus could not find a final solution to strip out all styles from <p style="">...</p> using RegEx with Javascript, but leave color and background-color if they exist.
What I found:
1. Remove plete style="..." element with RegEx:
htmlString = (htmlString).replace(/(<[^>]+) style=".*?"/i, '');
2. Remove certain styles with RegEx:
htmlString = (htmlString).replace(/font-family\:[^;]+;?|font-size\:[^;]+;?|line-height\:[^;]+;?/g, '');
Challenge: In case, we remove all styles assigned (no color exists), and style is empty (we have style="" or style=" "), the style attribute should be removed as well.
I guess we need two lines of code?
Any help appreciated!
Example 1 (whitelisted "color" survives):
<p style="font-family:Garamond;font-size:8px;line-height:14px;color:#FF0000;">example</p>
should bee:
<p style="color:#FF0000;">example</p>
Example 2 (all styles die):
<p style="font-family:Garamond;font-size:8px;line-height:14px;">example</p>
should bee:
<p>example</p>
I am still not able to use regular expressions by heart, thus could not find a final solution to strip out all styles from <p style="">...</p> using RegEx with Javascript, but leave color and background-color if they exist.
What I found:
1. Remove plete style="..." element with RegEx:
htmlString = (htmlString).replace(/(<[^>]+) style=".*?"/i, '');
2. Remove certain styles with RegEx:
htmlString = (htmlString).replace(/font-family\:[^;]+;?|font-size\:[^;]+;?|line-height\:[^;]+;?/g, '');
Challenge: In case, we remove all styles assigned (no color exists), and style is empty (we have style="" or style=" "), the style attribute should be removed as well.
I guess we need two lines of code?
Any help appreciated!
Example 1 (whitelisted "color" survives):
<p style="font-family:Garamond;font-size:8px;line-height:14px;color:#FF0000;">example</p>
should bee:
<p style="color:#FF0000;">example</p>
Example 2 (all styles die):
<p style="font-family:Garamond;font-size:8px;line-height:14px;">example</p>
should bee:
<p>example</p>
Share
Improve this question
edited Sep 13, 2012 at 18:37
Avatar
asked Sep 13, 2012 at 18:18
AvatarAvatar
15.2k11 gold badges136 silver badges217 bronze badges
8
- 1 Don't parse or modify HTML with Regex. It's not going to end well. – g.d.d.c Commented Sep 13, 2012 at 18:20
- I know about this discussion thank you :) For my case it is fine to use RegEx. – Avatar Commented Sep 13, 2012 at 18:23
- It isn't, it's never fine, it is doable at best. unless you are a devil worshipping, virgin-blood drinking wonderpony, mend your ways... please – Elias Van Ootegem Commented Sep 13, 2012 at 18:59
- "No matter how many times we say it, they won't stop ing every day..." +1 --- However, my example above can be used for other scenarios (XML), and must not be bound to HTML ;) – Avatar Commented Sep 13, 2012 at 19:06
- I'm assuming that you don't know the order in which the style attributes will appear beforehand, right? Because really, in that case you won't get a satisfactory solution with regex. – Tim Pietzcker Commented Sep 13, 2012 at 19:49
2 Answers
Reset to default 3First, the proof of concept. Check out the Rubular demo.
The regex goes like this:
/(<[^>]+\s+)(?:style\s*=\s*"(?!(?:|[^"]*[;\s])color\s*:[^";]*)(?!(?:|[^"]*[;\s])background-color\s*:[^";]*)[^"]*"|(style\s*=\s*")(?=(?:|[^"]*[;\s])(color\s*:[^";]*))?(?=(?:|[^"]*)(;))?(?=(?:|[^"]*[;\s])(background-color\s*:[^";]*))?[^"]*("))/i
Broken down, it means:
(<[^>]+\s+) Capture start tag to style attr ($1).
(?: CASE 1:
style\s*=\s*" Match style attribute.
(?! Negative lookahead assertion, meaning:
(?:|[^"]*[;\s]) If color found, go to CASE 2.
color\s*:[^";]*
)
(?!
(?:|[^"]*[;\s]) Negative lookahead assertion, meaning:
background-color\s*:[^";]* If background-color found, go to CASE 2.
)
[^"]*" Match the rest of the attribute.
| CASE 2:
(style\s*=\s*") Capture style attribute ($2).
(?= Positive lookahead.
(?:|[^"]*[;\s])
(color\s*:[^";]*) Capture color style ($3),
)? if it exists.
(?= Positive lookahead.
(?:|[^"]*)
(;) Capture semicolon ($4),
)? if it exists.
(?= Positive lookahead.
(?:|[^"]*[;\s])
(background-color\s*:[^";]*) Capture background-color style ($5),
)? if it exists.
[^"]*(") Match the rest of the attribute,
capturing the end-quote ($6).
)
Now, the replacement,
\1\2\3\4\5\6
should always construct what you expect to have left!
The trick here, in case it's not clear, is to put the "negative" case first, so that only if the negative case fails, the captures (such as the style attribute itself) would be populated, by, of course, the alternate case. Otherwise, the captures default to nothing, so not even the style attribute will show up.
To do this in JavaScript, do this:
htmlString = htmlString.replace(
/(<[^>]+\s+)(?:style\s*=\s*"(?!(?:|[^"]*[;\s])color\s*:[^";]*)(?!(?:|[^"]*[;\s])background-color\s*:[^";]*)[^"]*"|(style\s*=\s*")(?=(?:|[^"]*[;\s])(color\s*:[^";]*))?(?=(?:|[^"]*)(;))?(?=(?:|[^"]*[;\s])(background-color\s*:[^";]*))?[^"]*("))/gi,
function (match, $1, $2, $3, $4, $5, $6, offset, string) {
return $1 + ($2 ? $2 : '') + ($3 ? $3 + ';' : '')
+ ($5 ? $5 + ';' : '') + ($2 ? $6 : '');
}
);
Note that I'm doing this for fun, not because this is how this problem should be solved. Also, I'm aware that the semicolon-capture is hacky, but it's one way of doing it. And one can infer how to extend the whitelist of styles, looking at the breakdown above.
You can acplish this without using Regex by using this function
function filter_inline_style(text){
var temp_el = document.createElement("DIV");
temp_el.innerHTML = text;
var el = temp_el.firstChild;
console.log("el", el);
// Check if text contain html tags
if(el.nodeType == 1){
var background = el.style.backgroundColor;
var color = el.style.color;
el.removeAttribute('style');
el.style.backgroundColor = background;
el.style.color = color;
return el.outerHTML
}
return temp_el.innerHTML;
}
To use it:
var text = '<p style="font-size:8px;line-height:14px;color:#FF0000;background-color: red">example</p>';
var clean_text = filter_inline_style(text);
console.log(clean_text);
// output: <p style="background-color: red; color: rgb(255, 0, 0);">example</p>
本文标签: javascriptRegEx to remove all styles but leave color and backgroundcolor if they existStack Overflow
版权声明:本文标题:javascript - RegEx to remove all styles but leave color and background-color if they exist - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1745244205a2649481.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论