next up previous
Next: Text Scanning Performance and Up: Overall Design Previous: Universal Character Set

Memory Usage

To represent text in any form of extended character set, there must be a character entity that can represent more than an 8-bit quantity and a type of string that can store these characters. One way to adapt this new representation is to change Character and String uniformly so that all instances of Character or String represent this new wide character and string (uniform approach). One of the most advanced multilingualized system, Emacs after version 20, uses this approach [4]. Another way is to add new representations and let them co-exist with the exisiting default ones (mixed approach).

The uniform wide character representation is cleaner, but takes much space. In original version 3.2 image, The total size of the String subinstances occupy is about 1.5MB. If we use unsatisfying 16-bit uniform representation or 32-bit representation, the image size would grow a few megabytes.

We decided to use the mixed approach. The best representation is selected appropriately and implicitly converted to another representation if necessary. In Smalltalk, this kind of implicit conversion is easy to do. Also, migrating from original Squeak to m17n Squeak is easier this way.

We discuss the detail of the representation in section 3.


next up previous
Next: Text Scanning Performance and Up: Overall Design Previous: Universal Character Set
Owner 2003-02-08