admin管理员组文章数量:1296491
I'm using HTML Tidy in PHP and it's producing unexpected results because of a <script>
tag in a JavaScript string literal. Here's a sample input:
<html>
<script>
var t='<script><'+'/script>';
</script>
</html>
HTML Tidy's output:
<html>
<script>
//<![CDATA[
var t='<script><'+'/script>';
<\/script>
<\/html>
//]]>
</script>
</html>
It's interpreting </script></html>
as part of the script. Then, it adds another </script></html>
to close the open tags. I tried this on an online version of HTML Tidy (/) and it's producing the same error.
How do I prevent this error from occurring in PHP?
I'm using HTML Tidy in PHP and it's producing unexpected results because of a <script>
tag in a JavaScript string literal. Here's a sample input:
<html>
<script>
var t='<script><'+'/script>';
</script>
</html>
HTML Tidy's output:
<html>
<script>
//<![CDATA[
var t='<script><'+'/script>';
<\/script>
<\/html>
//]]>
</script>
</html>
It's interpreting </script></html>
as part of the script. Then, it adds another </script></html>
to close the open tags. I tried this on an online version of HTML Tidy (http://www.dirtymarkup./) and it's producing the same error.
How do I prevent this error from occurring in PHP?
Share Improve this question edited Mar 7, 2014 at 9:52 user2428118 8,1144 gold badges46 silver badges73 bronze badges asked Feb 26, 2014 at 0:31 Leo JiangLeo Jiang 26.2k59 gold badges176 silver badges327 bronze badges 7- 3 I would say "open a bug ticket", but they do not have any means to do so on their web site... – akonsu Commented Feb 26, 2014 at 0:35
- its an interesting bug but seems very specific to the close script </script> tag, I would just use your current solution.. also the use case for outputting the < and the /script> separately confuses me – clancer Commented Feb 26, 2014 at 0:40
- can you specify why do you want to add script tag to a variable. – Viswanath Polaki Commented Mar 1, 2014 at 4:53
- @ViswanathPolaki I'm parsing webpages and the authors of those webpages may want to do so. – Leo Jiang Commented Mar 1, 2014 at 5:01
- 1 Bug reports go here: sourceforge/p/tidy/bugs But it does not seem like they want to solve any of them. Sad :( – func0der Commented Mar 7, 2014 at 15:52
6 Answers
Reset to default 6 +50After playing around with it a bit I discovered that one can use ment //'<\/script>'
to confuse the algorithm in a way to prevent this bug from occurring:
<html>
<script>
var t='<script><'+'/script>'; //'<\/script>'
</script>
</html>
After clean-up:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<script>
var t='<script><'+'/script>'; //'<\/script>'
</script>
<title></title>
</head>
<body>
</body>
</html>
My guess is that as the clean-up algorithm looks through the codes and detects the string <script>
twice, it looks for </script>
immediately. And separting <
with /script>
makes the second </script>
goes undetected, which is why it decided to add another </script>
at the end of the codes and somehow also closed it with antoher </html>
. (Poor design indeed!)
So I made a second assumption that there isn't an if-statement in the algorithm to determine if a </scirpt>
is in a ment, and I was right! Having another string <\/script>
as a javascript ment indeed makes the algorithm to think that there are two </script>
in total.
There's no need for string concatenation to avoid the closing </script>
. Simply escaping the /
character is enough to "fool" the parsers in browsers and, it seems, HTML Tidy's parser as well:
<html>
<script>
var t='<script><\/script>';
</script>
</html>
Try to make the script tag not a full word but a string concatenation
<html>
<script>
var t='<scr'+'ipt><'+'/script>';
</script>
</html>
Resulting cleaned code
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<script>
var t='<scr'+'ipt><'+'/script>';
</script>
<title></title>
</head>
<body>
</body>
</html>
This is probably a better practice to create a script tag like this: (this should also solve your tidy issues)
<script>
script = document.createElement('script');
script.type = 'text/javascript';
script.src = 'http://myserver./file.js';
document.getElementsByTagName('head')[0].appendChild(script);
</script>
One way is to make it so tidy doesn't detect the script tag. The "cleanest" way I could e up with is to escape a character in the tag.
<html>
<script>
var t='<\script><'+'/script>';
</script>
</html>
so you could even do this, without having to break the string up as above:
var t='<\script></\script>';
That just works as expected
<html>
<script>
var t='<'+'script><'+'/script>';
</script>
</html>
By the way, string concatenation is not best way to create dynamically HTML to insert in page, look for document.createElement or even templates engines (handlebars.js is my favourite)
本文标签: phpHTML Tidy fails on script tag in JavaScript string literalStack Overflow
版权声明:本文标题:php - HTML Tidy fails on script tag in JavaScript string literal - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741637632a2389732.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论