Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

PRB: XML Parser Cannot Parse UTF-7 Documents


View products that this article applies to.

Symptoms

When attempting to load an XML file saved as UTF-7 (a transfer encoding format for Unicode), the XML parser in Internet Explorer generates the following error message:
Invalid at the top level of the document.
The same error also occurs when using the MSXML parser from server-side or client-side script.

↑ Back to the top


Cause

Versions of the MSXML parser prior to MSXML 2.6 do not support UTF-7.

↑ Back to the top


Resolution

To resolve this problem, save your XML documents as UTF-8, the preferred transfer encoding format for Unicode.

MSXML 2.6 or later supports UTF-7 encoding.

↑ Back to the top


Status

This behavior is by design.

↑ Back to the top


More Information

Although Unicode is a uniform character set representing nearly all the world's languages, there are many byte representations, or transformation formats, that a Unicode file can use. The most popular format is UTF-8, which represents Unicode characters as a sequence of one to four 8-bit bytes. UTF-7 is a 7-bit transformation format defined to allow Unicode text to pass through mail gateways that assume ASCII and strip out the high bit of text messages.

Based on the XML 1.0 standard, Section 4.3.3, a valid XML file is required to be one of following:

  • A Unicode file in UTF-8 format.
  • A Unicode file in UTF-16 format.
  • A file in some other character encoding (for example, ASCII) that has as its very first bytes the
UTF-7 does not use the Byte Order Mark. Also, UTF-7 converts the special XML character < to +ADw, which ends up being the first character of the UTF-7 encoded XML document. Since this is not compliant with the XML standard, MSXML refuses to load such files.

Many text editors and word processors allow you to save Unicode text files, known as encoded text in Microsoft Word, in many different transfer encodings, including UTF-7. So if you save a document in Word as "encoded text UTF-7," MSXML will refuse to load it for the above reasons.

Steps to Reproduce Behavior

  1. Create a simple XML file in Word 2000:
    <?xml version="1.0"?>

    <MyTag>
    <EmbeddedTag name1="value"/>
    </MyTag>
  2. Save the file as encoded text. When Word asks you if you wish to lose formatting, click Yes. Word will then prompt you for an encoding format to use. Select UTF-7, and then save the document as cap file name TestUTF7.xml.
  3. Load cap file name TestUTF7.xml in Internet Explorer 5. You will receive the following error message:
    Invalid at the top level of the document. Line 1, Position 1

    +ADw-?xml version+AD0AIg-1.0+ACI-?+AD4-.

↑ Back to the top


References

For the latest Microsoft Global Software Development

http://www.unicode.org/ for the latest Unicode Standard.
For more information about developing Web-based solutions for Microsoft Internet Explorer, visit the following Microsoft Web sites:(c) Microsoft Corporation 2000, All Rights Reserved. Contributions by Jay Andrew Allen, Microsoft Corporation.

↑ Back to the top


Keywords: kbbug, kbdsupport, kbintl, kbintldev, kbnofix, kbprb, kbunicode, kb

↑ Back to the top

Article Info
Article ID : 251134
Revision : 2
Created on : 8/1/2019
Published on : 8/1/2019
Exists online : False
Views : 293