Archive for April 2010

Unicode and custom input methods in Wunderkammer (#4: Custom input method)

Step 3: Custom input method.

First of all, you will need to add your custom input method into the code of the MenulessTextField class in the source code of Wunderkammer. This can be done in any text editor, but as you will need to compile the whole thing later anyway, its probably better to do everything in NetBeans or another Java IDE right from the beginning. You can find the whole MenulessTextField class with the custom Tura-French input method here.

There are only a few things that you are likely to need to modify. Thus, you will probably want to customize the mappings for low case (see picture) and upper case letters onto a typical 3×4 mobile phone keypad. The <\uXXXX> sequences are the unicodes for every special character (I added the characters themselves as comments in green to make things clear).

The low case mappings of the Tura custom input method

Furthermore, you might wish to modify the part “\ue003\ue004\ue005” in brackets after TextField.addInputMode and TextField.setDefaultInputModeOrder, which stands for <ɛ̀ɛ̂ɛ́>. This is just the way I chose to indicate the low case Tura custom input method at the right end of the search box in WK, similar to “Abc” or “123”.

When you are done with adding your custom input method, build the WK project in NetBeans (there is a button for this in NetBeans). To update the WK binaries in wkimporting, all you need to do is replace all the files in the directory wkimporting/bundle/build/preverified with all the preverified binaries from the project. So build the project and then go to build/preverified and copy everything except the ‘META-INF’ and ‘res’ directories to wkimporting/bundle/build/preverified.

Basically, that’s it. Now, after you import the dictionary into WK, you should get a dictionary with the custom input method you defined.

Unicode and custom input methods in Wunderkammer (#3: Theme file)

Step 2: Dictionary theme file.

You can customize the dictionary theme file with the LWUIT resource editor in many ways. For instance, you can create a bitmap font and include it into the theme file. The procedure is rather straightforward. You select a font installed on your system, its size and style, anti-aliasing method and define a character set that needs to be included in the bitmap font. For example, the character set of TuraGSM Sans Condensed I used is reproduced below. Note that the last character in the charset is a space (U+0020).

TuraGSM Sans Condensed charset for the Tura dictionary theme file

TuraGSM Sans Condensed charset for the Tura dictionary theme file

Unicode and custom input methods in Wunderkammer (#2: Font)

Step 1: Font

You need to have a font with all the necessary characters as single glyphs. The reason is that the dictionary theme file in LWUIT (step 2) uses a bitmap font which cannot use combining characters (LWUIT creates such a font for you from any font available on your system). Thus, the character <à>, latin small letter a with grave, must be represented by a glyph with unicode point U+00E0 and not as a combination of two glyphs, viz. one with U+0061 for <a> and one with U+0300 for the combining grave accent. This single glyph requirement was problematic for the Tura script because it contains a range of characters, viz. some IPA characters with tone diacritics, that Unicode fonts render only by means of a combination of a glyph for the respective letter with a combining glyph for the diacritic, such as <ɔ̀> latin small letter open o with grave.

To get around this problem, I created a new Unicode TrueType font software, TuraGSM Sans Condensed, that contains the necessary accented characters as single glyphs in the Private Use Area range (U+E000 – U+F8FF). I did this by modifying DejaVu Sans Condensed, a free Unicode font, using FontForge, a free font editor application. DejaVu Sans Condensed font makes part of the DejaVu family. DejaVu fonts are TrueType fonts that contain glyphs of most Unicode characters, including all the IPA characters and diacritics used in the Tura script. I chose the Sans Condensed version of DejaVu as a basis for the Tura font because it produces particularly compact lines which normally still remain well legible even at a small font size, such as 10 points, the size that I used for the indexes and entries in the Tura mobile phone dictionary. This is an obvious advantage when rendering characters on mobile phone screens. Besides creating new single glyphs for some of the accented characters, I also had to modify a few glyphs used for the Tura script to enhance their visual distinctiveness on a mobile phone screen.

Step 1 will probably be the most time consuming part of the process of tweaking WK for your language.

Unicode and custom input methods in Wunderkammer (#1)

Wunderkammer now endorses Unicode and allows the use of custom input methods, which practically means that you can use any IPA or other non-ASCII characters in your dictionary and let the users search for words using all these characters.[1] To paraphrase, that’s a small step for Wunderkammer, one giant leap for its users. Especially, for those of them, such as myself, who are involved in languages outside of Australia and Oceania.


For quite some time, I’ve been working (in Toolbox and my spare time) on a dictionary of Tura (toura in French, wɛɛn /wɛ̰̀ɛ̰̀/ in Tura), a small Mande language spoken in a mountaneous region near the city of Man in the west of Ivory Coast. My Tura-French dictionary currently counts around 3000 quite well-elaborated entries, which, given the predominantly monosyllabic nature of the Tura lexicon, represents a rather complete coverage (you may have a look at it here). Eventually, I hope to make it nice and publishable as a book. In the meantime, a mobile phone version is likely to be a much more palpable outcome of the project for the community. And it will definitely be more exciting to use than a book.

Here, I’d like to share my experience in adapting Wunderkammer for Tura. If you’d like to tweak WK for another language, you will need to do the following three things:

  1. If you do not have a font that has all the characters you need as single glyphs (as was the case with the Tura script), you will need to create one yourself, e.g. using FontForge (potentially, this is rather time consuming, but otherwise not so difficult)
  2. Customize the dictionary theme file with the LWUIT resource editor (also see wksite) by adding the desired character set (that’s an easy one)
  3. Add your custom input method into the code of the MenulessTextField class in the source code of Wunderkammer, build the modified version of Wunderkammer and update the wkimport binaries (this part sounds much worse than it is in reality)

To accomplish this, you will probably need to install some additional software and look up the unicodes for the characters you wish to use. The (free) software you will need includes an IDE (integrated development envinronment) for Java, such as NetBeans, the LWUIT resource editor and a font editor application, such as FontForge. The software mentioned definitely works on Windows. I do not have experience with other systems.


As a result, you should be able to use wkimport to build a dictionary with all the necessary characters and the desired input method. To give you an idea of what the end result may look like, here is a demo version of the Tura-French dictionary and some screenshots (for the installation procedure see wksite). The theme image is an oil palm nut kernel, wɛ̂n in Tura, which according to folk etymology is the source for the word wɛɛn ‘Tura’. 

[1] Small print: except for complex East Asian scripts, such as Chinese, and right-to-left scripts, such as Arabic. It must be possible to make Wunderkammer work with them as well, but I’m passing on that, as these scripts are somewhat beyond my current interests.