next up previous
Next: Implicit Conversion Up: The Design and Implementation Previous: Character Representation


String and Symbol Representation

Similar to the Characters in orginal version of Squeak, a string of characters, represented as an instance of String class, is essentially an array of 8-bit values. We have added new class called MultiString that is a variable word class.

We change the class hierachy of String to avoid redundant method duplication. While most of the methods in String should be compatible with an instance of MultiString, we can't simply subclass MultiString from String because a variable word class cannot be a subclass of a variable bytes class. We have inserted an abstract class called AbstractString above String and moved most of the String methods to the class. In m17n Squeak image, both String and MultiString are the subclasses of AbstractString and they have specialized methods that depend on the actual character size.

To make these changes, We modified SystemTracer2 [7] to produce an image with modified String class hierachy. Normally, such class hierarchy change can be simply done by editing the class definitions in a browser. However, this doesn't work for String because the virtual machine (VM) holds a pointer to the String class object.

To avoid the size change of the subclass array in ArrayedCollection, we first added an empty class called AbstractString under ArrayedCollection and added another placeholder class called DummyString under AbstractString. In the ``post-process'' phase of modified SystemTracer2, the oop to for DummyString in the subclass array of AbstractString and the oop to the String in the subclass array of ArrayedCollection are swapped.

Thanks to the late-bound and generic nature of Squeak and Squeak VM, the modified image runs on the unmodified VM. After using the system tracer to change the hierarchy, moving the methods and the class variables from String to AbstractString, could be done in the live image.

We also added a subclass of MultiString called MultiSymbol which is essentially a copy of Symbol. This class holds symbol tables similar to the ones in Symbol. Again, MultiSymbol implements the same protocol as Symbol and since the VM only uses the oop of a symbol as the lookup key in MethodDictionaries, an image with MultiSymbol class names and method namesruns on the unmodified VM.


next up previous
Next: Implicit Conversion Up: The Design and Implementation Previous: Character Representation
Owner 2003-02-08