INFO: Internet Explorer Always POSTs Unicode Data as UTF-8

View products that this article applies to.

Summary

For Unicode DHTML pages, Internet Explorer always POSTs Unicode data in UTF-8 format, regardless of the specific Unicode encoding used.

↑ Back to the top

More information

When you deal with Globalization and character sets, it is important to distinguish between character sets and character set encodings. A character set is a mapping of numeric values to characters in a character repertoire. A character set encoding is a specific bit representation of the integer values in the character set. Unicode is a 16-bit character set with several different encodings, including UCS-2, UTF-16, UTF-7, and UTF-8. Web developers typically specify the encoding of the page (and thus, by implication, the charset) by using the META charset value, as follows:

<META Name="Content-Type" Value="text/html;charset=utf-16">

Typically, Internet Explorer encodes POST data in accordance with the page encoding. If you write a Japanese page that uses the Shift-JIS Japanese character encoding, Internet Explorer submits the POST data in Shift-JIS. If the page uses the Unicode character set, however, it encodes the submission as UTF-8, even if the encoding used is, for example, UTF-16. This is because many Web servers (including IIS) cannot process UTF-16 surrogates, which are special Unicode extensions that allow you to use 32 bits to address a character instead of the usual 16.

Note that this rule applies no matter whether the form uses an ENCTYPE of application/x-www-form-urlencoded or multipart/form-data.

As of the writing of this article, the format of Unicode POST transmissions is currently not dictated by any standard; however, working drafts by the World Wide Web Consortium (W3C) indicate a move toward UTF-8 as the standard Unicode encoding for the Web. Developers should use UTF-8 for all Unicode data that they send to and receive from the browser.

Developers who are also using SQL Server need to use one of the remedies that are suggested in the following Knowledge Base article for storing UTF-8 data in SQL Server's UCS-2 Unicode fields:

232580 INF: Storing UTF-8 in SQL Server

↑ Back to the top

References

For more information on the W3C, see the following Web site:

World Wide Web Consortium
http://www.w3.org

Unicode character set encodings are defined in detail in the Unicode Standard, which is avaliable from the following Web site:

Unicode
http://www.unicode.org/

For information on character sets in Internet Explorer, see following Microsoft Developer Network (MSDN) Web site:

Character Set Recognition
http://msdn2.microsoft.com/en-us/library/Aa752010.aspx

Support WebCast: Globalization in Internet Explorer
http://support.microsoft.com/servicedesks/webcasts/wc050400/wcblurb050400.asp

For more information about developing Web-based solutions for Microsoft Internet Explorer, visit the following Microsoft Web sites:

http://msdn.microsoft.com/ie/

http://support.microsoft.com/iep

↑ Back to the top

Properties

Retired KB Content Disclaimer

This article was written about products for which Microsoft no longer offers support. Therefore, this article is offered "as is" and will no longer be updated.

↑ Back to the top

Applies to:

↑ Back to the top

Keywords: KB303612, kbhowto

↑ Back to the top

Article Info

Article ID	:	303612
Revision	:	4
Created on	:	5/18/2007
Published on	:	5/18/2007
Exists online	:	False
Views	:	567

Microsoft KB Archive Search

INFO: Internet Explorer Always POSTs Unicode Data as UTF-8

Summary

More information

References

Properties

Applies to: