admin管理员组文章数量:1357273
I have pretty standard javascript/XHR drag-and-drop file upload code, and just came across an unfortunate real-world snag. I have a file on my (Win7) desktop called "TEST-é-TEST.txt". In Chrome (30.0.1599.69), it arrives at the server with filename in UTF-8, which works out fine. In Firefox (24.0), the filename seems mangled when it arrives at the server.
I didn't trust what Firebug/Chrome might be telling me about the encoding, so I examined the hex of the request packet. Everything else is the same except the non-ASCII character is indeed being encoded differently in the two browsers:
Chrome: C3 A9 (this is the expected UTF-8 for that character)
Firefox: EF BF BD (UTF-8 "replacement character"?!)
Is this a Firefox bug? I tried renaming the file, replacing the é with ó, and the Firefox hex was the same... so such a mangle really seems like a browser bug. (If Firefox were confusedly sending along ISO-8859-1, for example, without touching it, I'd see an E9 byte, and I could handle that on the server side, but it shouldn't mangle it!)
Regardless of the reason, is there something I can do on either the client or server sides to correct for this? If a replacement character is indeed being sent to the server, then it would seem unrecoverable there, so I almost certainly need to do it on the client side.
And yes, the page on which this code exists has charset=utf-8
, and Firefox confirms that it perceives the page as UTF-8 under View>Character Encoding.
Furthermore, if I dump the filename to console.log, it appears fine there--I guess it's just getting mangled in/after setRequestHeader("X-File-Name",file.name)
.
Finally, it would seem that the value passed to setRequestHeader() should be able to have code points up to U+00FF, so U+00E9 (é) and U+00F3 (ó) shouldn't cause a problem, though higher codes could trigger a SyntaxError:
I have pretty standard javascript/XHR drag-and-drop file upload code, and just came across an unfortunate real-world snag. I have a file on my (Win7) desktop called "TEST-é-TEST.txt". In Chrome (30.0.1599.69), it arrives at the server with filename in UTF-8, which works out fine. In Firefox (24.0), the filename seems mangled when it arrives at the server.
I didn't trust what Firebug/Chrome might be telling me about the encoding, so I examined the hex of the request packet. Everything else is the same except the non-ASCII character is indeed being encoded differently in the two browsers:
Chrome: C3 A9 (this is the expected UTF-8 for that character)
Firefox: EF BF BD (UTF-8 "replacement character"?!)
Is this a Firefox bug? I tried renaming the file, replacing the é with ó, and the Firefox hex was the same... so such a mangle really seems like a browser bug. (If Firefox were confusedly sending along ISO-8859-1, for example, without touching it, I'd see an E9 byte, and I could handle that on the server side, but it shouldn't mangle it!)
Regardless of the reason, is there something I can do on either the client or server sides to correct for this? If a replacement character is indeed being sent to the server, then it would seem unrecoverable there, so I almost certainly need to do it on the client side.
And yes, the page on which this code exists has charset=utf-8
, and Firefox confirms that it perceives the page as UTF-8 under View>Character Encoding.
Furthermore, if I dump the filename to console.log, it appears fine there--I guess it's just getting mangled in/after setRequestHeader("X-File-Name",file.name)
.
Finally, it would seem that the value passed to setRequestHeader() should be able to have code points up to U+00FF, so U+00E9 (é) and U+00F3 (ó) shouldn't cause a problem, though higher codes could trigger a SyntaxError: http://www.w3/TR/XMLHttpRequest2/#the-setrequestheader-method
Share Improve this question edited Feb 21, 2014 at 13:37 Darin Kolev 3,40913 gold badges33 silver badges47 bronze badges asked Oct 8, 2013 at 16:54 dlodlo 1,58315 silver badges26 bronze badges 13- It's hard to say anything useful here without having an idea of exactly where XHR is supposed to be getting the filename in your case. Where is that ing from? – Boris Zbarsky Commented Oct 8, 2013 at 20:21
- On drop, e.dataTransfer.files ... I haven't tried hard-coding a filename, to eliminate that variable. – dlo Commented Oct 8, 2013 at 20:49
- I just hardcoded the filename; same behavior. xhr.setRequestHeader("X-File-Name", "TEST-é-TEST.txt"); – dlo Commented Oct 8, 2013 at 21:01
- I think this really narrows down where the issue is: Firefox's implementation of setRequestHeader(), when you pass it a value that contains non-ASCII values. How to overe? – dlo Commented Oct 8, 2013 at 21:03
- Hmm. So looking at setRequestHeader in Firefox, it's doing what the spec says: dropping the high byte of each 16-bit unit of that JSString and putting the low byte in the header. So I'm not sure why you're seeing EF BF BD there. You should be seeing an E9 instead. What Firefox version are you using? – Boris Zbarsky Commented Oct 9, 2013 at 1:07
1 Answer
Reset to default 10Thanks so much for Boris's help. Here's a summary of what I discovered through our interactions in ments:
1) The core issue is that HTTP Request headers are supposed to be ISO-8859-1. Prior versions of Chrome and Firefox both passed along UTF-8 strings unchanged in setRequestHeader() calls. This changed in FF24.0 (and apparently will be changing in Chrome soon too), such that FF drops high bytes and passes along only the low byte for each character. In the example I gave in the question, this was recoverable, but characters with higher codes could be mangled irretrievably.
2) One workaround would be to encode on the client side, e.g.:
setRequestHeader('X-File-Name',encodeURIComponent(filename))
and then decode on the server side, e.g. in PHP:
$filename=rawurldecode($_SERVER['HTTP_X_FILE_NAME'])
3) Note that this is only problematic because my ajax file upload approach is to send the raw file data in the request body, so I need to send the filename via a custom request header (as shown in many tutorials online). If I used FormData instead, I wouldn't have to worry about this. I believe if you want solid, standards-based unicode filename support, you should use FormData and not the request header approach.
本文标签:
版权声明:本文标题:javascript - Handle non-ASCII filenames in XHR uploading - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743958936a2568656.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论