next up previous
Next: Conclusion on Design Up: Overall Design Previous: Keyboard Input


Text File Export and Exchange

The requirement for the file out format is two fold. Firstly, it should be understandable by the other non-Squeak software. Secondly, the identity on the roundtrip conversion from Squeak should be ensured. We don't require that the other way of round trip doesn't have to be identical.

Because the original Squeak's file out format already uses the MacRoman format and uses all 8 bits of the octets in a file out, we need to introduce a mechanism to let the characters in the extended character set co-exist with the original MacRoman characters in the file out. To satisfy this goal, there are three feasible encoding schemes for this external file format.

The first possible way is to adapt the X Compound Text format of X Window System, or ``ctext''[5]. The upside of ctext is compatibility with existing file outs. The 8-bit characters in the file out remain the same semantics. Also, many existing software can read and write this format at least partially. In fact, Japanese in ctext is essentially identical with the standard internet email format for Japanese[6]. The downside of using ctext is that not all scripts in Unicode have a defined sequencer character.

The second way is to use UTF-8 format for file out. The upside of this format is that all Unicode is representable in UTF-8. One downside is that the CJKV characters need to have an extra language tag for the round trip conversion, but there is no standard encoding scheme for this language tag. Another downside is that the encoding for the upper half of the ISO-8859-1 is now different from the existing file out.

The third way is to mix the above two. The file out actually consists of indivisual ``chunks'' and each chunk can be in different format. If the chunk should be represented in UTF-8, the program puts a special prefix (``<utf-8>'') and the string up to the terminator (``!'') is interpreted as UTF-8.


next up previous
Next: Conclusion on Design Up: Overall Design Previous: Keyboard Input
Owner 2003-02-08