Because the original Squeak's file out format already uses the MacRoman format and uses all 8 bits of the octets in a file out, we need to introduce a mechanism to let the characters in the extended character set co-exist with the original MacRoman characters in the file out. To satisfy this goal, there are three feasible encoding schemes for this external file format.
The first possible way is to adapt the X Compound Text format of X Window System, or ``ctext''[5]. The upside of ctext is compatibility with existing file outs. The 8-bit characters in the file out remain the same semantics. Also, many existing software can read and write this format at least partially. In fact, Japanese in ctext is essentially identical with the standard internet email format for Japanese[6]. The downside of using ctext is that not all scripts in Unicode have a defined sequencer character.
The second way is to use UTF-8 format for file out. The upside of this format is that all Unicode is representable in UTF-8. One downside is that the CJKV characters need to have an extra language tag for the round trip conversion, but there is no standard encoding scheme for this language tag. Another downside is that the encoding for the upper half of the ISO-8859-1 is now different from the existing file out.
The third way is to mix the above two. The file out actually consists of indivisual ``chunks'' and each chunk can be in different format. If the chunk should be represented in UTF-8, the program puts a special prefix (``<utf-8>'') and the string up to the terminator (``!'') is interpreted as UTF-8.