next up previous
Next: Fonts Up: The Design and Implementation Previous: Implicit Conversion


Text Composition

The basic concept of text composition stays the same for other scripts. A loop traverses an instance of String or MultiString, decides where to break lines and decides where each graphical representation of character is to be placed.

However, the original Squeak's text composition routines must be extended for multilingualized system. In many scripts, more than one conceptual character consistiutes one graphical representation.

Figure 3: The presentation text for an internal text is created by a CompositionScanner. Then the result, not the original text, is displayed by DisplayScanner.
\scalebox{0.6}{\includegraphics{presentation.eps}}

For this problem, we separate the conceptual text and its composited result called ``presentation''. Namely, a subclass of NewParagraph, MultiNewParagraph adds instance variables that represent the ``lines'' of the presentation. In the original Squeak, the task of CompositionScanner is to decide the line breaks for a text and line width and stores the result into lines instance variable. In m17n Squeak, the scanner creates another Text and sets up line breaks for this Text. Figure 3 depicts the simple example of combining a character ``a'' and an apostrophe (accute accent) character.

To represent the presentation text, we use the Unicode presentation character code point. This approach simplifies the handling of complex composition but only accepts combinations that have defined code points in Unicode. In the future, a more powerful rendering engine should be present to allow arbitrary glyph combinations.

Many scripts and languages have very distinct text composition and line break rules. Hebrew, Arabic and certain other languages are written from right to left. Japanese users often want to customize the line ending (``kinsoku'') rules, etc. Because it is hard to write one universal text scanner for all possible scripts, we have implemented separate methods for MultiCharacterScanner and switch them according to the language tag bits of the character. While a sequence of scanned characters shares the same encoding tag, the inner loop of the same scanning method keeps scanning the text. When the loop encounters a different encoding tag, it returns as if an imaginary stop condition was met. Subsequently, other scanner methods are called appropriately.


next up previous
Next: Fonts Up: The Design and Implementation Previous: Implicit Conversion
Owner 2003-02-08