admin管理员组文章数量:1321596
I'm pretty new, so don't be too harsh :)
Question(tl;dr)
I'm facing a problem passing an unicode String
from an embedded javax.swing.JApplet
in a web page to the Java Script part. I'm not sure this is whether a bug or a misunderstanding of the involved technologies:
Problem
I want to pass a unicode string from a Java Applet to Java Script, but the String gets messed up. Strangely, the problem doesn't occur not in Internet Explorer 10 but in Chrome (v26) and Firefox (v20). I haven't tested other browsers though.
The returned String seems to be okay, except for the last unicode character. The result in the Java Script Debugger and Web Page would be:
- abc → abc
- 表示 → 表��
- ま → ま
- ウォッチリスト → ウォッチリス��
- アップロード → アップロー��
- ホ → ��
- ホ → ホ (Not deterministic)
- アップロードabc → アップロードabc
The string seems to get corrupted at the last bytes. If it ends with an ASCII character the string is okay. Additionally the problem doesn't occur within every bination and also not every time (not sure on this). Therefore I suspect a bug and I'm afraid I might be posting an invalid question.
Test Set Up
A minimalistic set up includes an applet that returns some unicode (UTF-8) strings:
/* TestApplet.java */
import javax.swing.*;
public class TestApplet extends JApplet {
private String[] testStrings = {
"abc", // OK (because ASCII only)
"表示", // Error on last Character
"表示", // Error on last Character
"ホーム ", // OK (because of *space* after ム)
"アップロード", ... };
public TestApplet() {...}; // Applet specific stuff
...
public int getLength() { return testStrings.length;};
String getTestString(int i) {
return testStrings[i]; // Build-in array functionality because of IE.
}
}
The corresponding web page with java script could look like this:
/* test.html */
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<span id="output"/>
<applet id='output' archive='test.jar' code=testApplet/>
</body>
<script type="text/javascript" charset="utf-8">
var applet = document.getElementById('output');
var node = document.getElementById("1");
for(var i = 0; i < applet.getLength(); i++) {
var text = applet.getTestString(i);
var paragraphNode = document.createElement("p");
paragraphNode.innerHTML = text;
node.appendChild(paragraphNode);
}
</script>
</html>
Environment
I'm working on Windows 7 32-Bit with the current Java Version 1.7.0_21 using the "Next Generation Java Plug-in 10.21.2 for Mozilla browsers". I had some problems with my operating system locale, but I tried several (English, Japanese, Chinese) regional settings.
In case of an corrupt String chrome shows invalid characters (e.g. ��). Firefox, on the other hand, drops the string pletly, if it would be ending with ��.
Internet explorer manages to display the strings correctly.
Solutions?
I can imagine several workarounds, including escaping/unescaping and adding a "final char" which then is removed via java script. Actually I'm planning to write against Android's Webkit, and I haven't tested it there.
Since I would like to continue testing in Chrome, (because of Webkit technology and fort) I hope there is a trivial solution to the problem, which I might have overlooked.
I'm pretty new, so don't be too harsh :)
Question(tl;dr)
I'm facing a problem passing an unicode String
from an embedded javax.swing.JApplet
in a web page to the Java Script part. I'm not sure this is whether a bug or a misunderstanding of the involved technologies:
Problem
I want to pass a unicode string from a Java Applet to Java Script, but the String gets messed up. Strangely, the problem doesn't occur not in Internet Explorer 10 but in Chrome (v26) and Firefox (v20). I haven't tested other browsers though.
The returned String seems to be okay, except for the last unicode character. The result in the Java Script Debugger and Web Page would be:
- abc → abc
- 表示 → 表��
- ま → ま
- ウォッチリスト → ウォッチリス��
- アップロード → アップロー��
- ホ → ��
- ホ → ホ (Not deterministic)
- アップロードabc → アップロードabc
The string seems to get corrupted at the last bytes. If it ends with an ASCII character the string is okay. Additionally the problem doesn't occur within every bination and also not every time (not sure on this). Therefore I suspect a bug and I'm afraid I might be posting an invalid question.
Test Set Up
A minimalistic set up includes an applet that returns some unicode (UTF-8) strings:
/* TestApplet.java */
import javax.swing.*;
public class TestApplet extends JApplet {
private String[] testStrings = {
"abc", // OK (because ASCII only)
"表示", // Error on last Character
"表示", // Error on last Character
"ホーム ", // OK (because of *space* after ム)
"アップロード", ... };
public TestApplet() {...}; // Applet specific stuff
...
public int getLength() { return testStrings.length;};
String getTestString(int i) {
return testStrings[i]; // Build-in array functionality because of IE.
}
}
The corresponding web page with java script could look like this:
/* test.html */
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<span id="output"/>
<applet id='output' archive='test.jar' code=testApplet/>
</body>
<script type="text/javascript" charset="utf-8">
var applet = document.getElementById('output');
var node = document.getElementById("1");
for(var i = 0; i < applet.getLength(); i++) {
var text = applet.getTestString(i);
var paragraphNode = document.createElement("p");
paragraphNode.innerHTML = text;
node.appendChild(paragraphNode);
}
</script>
</html>
Environment
I'm working on Windows 7 32-Bit with the current Java Version 1.7.0_21 using the "Next Generation Java Plug-in 10.21.2 for Mozilla browsers". I had some problems with my operating system locale, but I tried several (English, Japanese, Chinese) regional settings.
In case of an corrupt String chrome shows invalid characters (e.g. ��). Firefox, on the other hand, drops the string pletly, if it would be ending with ��.
Internet explorer manages to display the strings correctly.
Solutions?
I can imagine several workarounds, including escaping/unescaping and adding a "final char" which then is removed via java script. Actually I'm planning to write against Android's Webkit, and I haven't tested it there.
Since I would like to continue testing in Chrome, (because of Webkit technology and fort) I hope there is a trivial solution to the problem, which I might have overlooked.
Share Improve this question edited May 3, 2013 at 13:24 Ian 50.9k13 gold badges103 silver badges111 bronze badges asked May 3, 2013 at 13:22 InunikuInuniku 2463 silver badges7 bronze badges 7-
2
I'm interested in what the real problem is. One idea I found is: make sure
javac
and/orjar
uses UTF8 encoding - if you don't specify it, it uses the machine default (which could be a problem) – Ian Commented May 3, 2013 at 15:21 - 1 Thanks ! I'll try this later on. I want to point out, that the data flow from java script to applet (calling parameter) works as expected. Only the return gets messed up. – Inuniku Commented May 3, 2013 at 15:31
- 1 Absolutely. You showed/explained that it all works fine, except for the string returned in special cases (the last character in the returned string has a unicode character). I think you explained the situation very well and laid out everything in a very organized way :) – Ian Commented May 3, 2013 at 15:44
- Please can you show the code that actually writes the string to send it to the browser? – Danack Commented May 4, 2013 at 16:29
- 2 As it's possibly a duplicate of Java not defaulting to UTF8 encoding for strings stackoverflow./questions/81323/… – Danack Commented May 4, 2013 at 16:32
4 Answers
Reset to default 1If you are testing in Chrome/Firefox
Please replace first line with this and then test it,
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3/TR/html4/loose.dtd">
The Doctype has significant value while browser identifies the page.
Transitional /loose it the types you can use with Unicode. Please test and reply..
I suggest to set a breakpoint on
paragraphNode.innerHTML = text;
and inspect text it in the JavaScript console, e.g. with
console.log(escape(text));
or
console.log(encodeURIComponent(text));
or
for (i=0; i < text.length; i++) {
console.log("i = "+i);
console.log("text.charAt(i) = "+text.charAt(i)
+", text.charCodeAt(i) = "+text.charCodeAt(i));
}
See also
http://www.fileformat.info/info/unicode/char/30a6/index.htm
https://developer.mozilla/en-US/docs/DOM/window.escape (which is not part of any standard)
and
https://developer.mozilla/en-US/docs/JavaScript/Reference/Global_Objects/encodeURIComponent
or similar resources.
Your source files may not be in the encoding you assume (UTF-8).
JavaScript assumes UTF-16 strings:
http://www.ecma-international/ecma-262/5.1/#sec-4.3.16
Java also assumes UTF-16:
http://docs.oracle./javase/1.5.0/docs/api/java/lang/String.html
The Linux or Cygwin file
mand can show you the encoding of your files.
See
http://linux.die/man/1/file (haven't found a kernel man reference)
You need to make sure to add the following Java Argument to your applet/embed tag:
-Dfile.encoding=utf-8
i.e. java_arguments="-Dfile.encoding=utf-8"
Otherwise it is going to expect and treat the applet as ASCII text.
Okay, I'm a little bit embarassed, because I thought I tried it enough: I was actually using non-latin locale (e.g Chinese(PRC) or Japanese(Japan) in the windows' system locale settings. When I changed back to English(USA) or German(Germany) everything worked as excpected.
I'm still wondering, why it would affect Chrome & Mozilla in such a strange way, because Java and modern browsers should be unicode-based; So I won't accept this as an answer! The problem reoccurs by switching back to japanese and I'm going to test it on different systems.
I want to thank for all the posters for the enlightning input... and I will still putting some effort in solving this question.
本文标签:
版权声明:本文标题:javascript - Why does my Unicode String get corrupted, when passed from Java Applet to Java Script? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742106578a2421045.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论