admin管理员组文章数量:1316016
I have been developing a parser that takes JavaScript as input and creates a pressed version of that JavaScript as output.
I found initially that the parser failed when attempting to read the input JavaScript. I believe this has something to do with the fact that Visual Studio 2008 saves its files by default as UTF-8. And when doing so, VS includes a couple of hidden characters at the start of the UTF-8 file.
As a workaround, I used Visual Studio to save the file as code page 1252. After doing so, my parser was able to read the input JavaScript.
Note that I need to use special European characters that include accents.
So, here are my questions:
- Should I use code page 1252 or UTF-8?
- Why does Visual Studio save files as UTF-8 by default?
- If I choose to save files as 1252 will that lead to problems?
- It appears to me that Eclipse saves files as code page 1252 by default. Does that sound right?
I have been developing a parser that takes JavaScript as input and creates a pressed version of that JavaScript as output.
I found initially that the parser failed when attempting to read the input JavaScript. I believe this has something to do with the fact that Visual Studio 2008 saves its files by default as UTF-8. And when doing so, VS includes a couple of hidden characters at the start of the UTF-8 file.
As a workaround, I used Visual Studio to save the file as code page 1252. After doing so, my parser was able to read the input JavaScript.
Note that I need to use special European characters that include accents.
So, here are my questions:
- Should I use code page 1252 or UTF-8?
- Why does Visual Studio save files as UTF-8 by default?
- If I choose to save files as 1252 will that lead to problems?
- It appears to me that Eclipse saves files as code page 1252 by default. Does that sound right?
5 Answers
Reset to default 9UTF-8 is a better option as it really support all known characters, while with 1252 you might end up with characters that you need missing from it (even in European languages).
Apparently, VS2008 saves UTF-8 with a byte order mark - it should be possible to either switch that off, or have the parser recognize it, or strip the BOM somewhere in between.
utf-8 has byte order mark (BOM) signature at the beginning of a file which some editors, and obviously libraries don't understand... http://en.wikipedia/wiki/Byte-order_mark
if you can get around it, UTF-8 is preferred today by all means. try stripping that first bytes of BOM before giving the JS code to that parser, or find an option in your IDE if it can not write that
1252 doesn't cause this issue and you won't have problems with it, but you'll output your web in an outdated format, i wouldn't do it today, there was a lot of encoding mess on the web in the past with iso vs. win codepages for different languages...
Use UTF-8. 1252 does not cover whole Europe, so in some countries (central Europe) you should use 1250, or more correctly - iso 8859-2. So the only real option is UTF-8.
Using 1252 will cause issues?
Depends on the countries you app needs to work in
From the Top of my head, 1252 (or ISO 8859-1) will work in
- UK
- Germany
- Switzerland
- Austria
- Italy
- France
- Netherlands
- Iceland
- Spain
Oh, Wikipedia has a more prehensive List: http://en.wikipedia/wiki/ISO/IEC_8859-1
So you can use CP 1252 if your app is only used in the mentioned countries/languages.
BOM was at the start of the file. IMHO you should use utf8, its very current nowadays.
本文标签:
版权声明:本文标题:UTF-8 vs code page 1252 in Visual Studio 2008 for HTML and JavaScript that includes European characters - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741962623a2407371.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论