The description of the CONTINUE record of the Excel 8 BIFF contained in the Microsoft Excel 97 Developer's Kit is incomplete.
When a SST record contains a string that is continued in a CONTINUE record, the description of the CONTINUE record for a BIFF 8 (Excel 97, Excel 2000, and Excel 2002 Workbook,) states that the record data continues at offset 4. This omits any comment to the effect that at offset 4 there is a grbit field holding a flag that describes the UNICODE state - compressed or uncompressed - of the portion of the string that is continued beginning at offset 5.
↑ Back to the top
The SST record is the Shared String Table record, and will contain the strings of text from cells of the worksheet. For Excel 97, Excel 2000, Excel 2002 the size of this record is limited to 8224 bytes of data, including the formatting runs and string-length information. Shared string data that exceeds this limit will be stored in a CONTINUE record. When the last string in the SST must be broken into two segments, and the last segment stored as the first data in the CONTINUE record, that segment may be stored as either compressed or uncompressed Unicode. Consequently, the first byte of the data will contain 00h or 01h. This is a one-byte field called a grbit field. It is not part of the string segment.
The grbit flag value 00h says no bytes of the data need Unicode high-order byte data, so all are stored as compressed Unicode (all the high-order bytes of the Unicode representation of the data characters have been stripped. They all contained 00h, so Excel manages the logic of restoring that high-order information when it loads the record.)
Where any character in the data segment requires Unicode high-order byte information, the grbit flag will be 01h, and all characters in the string segment will be two-byte, uncompressed Unicode.
Steps to Reproduce the Problem
Fill a cell with 9000 characters of text. This will cause the string to be continued in a CONTINUE record. If you have the Japanese Language kit you can use the Osaka font to enter one or two characters at the end, which will cause those characters, and consequently, the portion of the string in the CONTINUE record, to be uncompressed Unicode.
At offset 4 of the CONTINUE record, you will see the value "01", which is the grbit value for uncompressed Unicode. Otherwise, you will see "00". Neither flag is part of the original string.
↑ Back to the top
For more information, see the Microsoft Excel 97 Developer's Kit - ISBN 1-57231-498-2
↑ Back to the top