Unicode and custom input methods in Wunderkammer (#2: Font)

Step 1: Font

You need to have a font with all the necessary characters as single glyphs. The reason is that the dictionary theme file in LWUIT (step 2) uses a bitmap font which cannot use combining characters (LWUIT creates such a font for you from any font available on your system). Thus, the character <à>, latin small letter a with grave, must be represented by a glyph with unicode point U+00E0 and not as a combination of two glyphs, viz. one with U+0061 for <a> and one with U+0300 for the combining grave accent. This single glyph requirement was problematic for the Tura script because it contains a range of characters, viz. some IPA characters with tone diacritics, that Unicode fonts render only by means of a combination of a glyph for the respective letter with a combining glyph for the diacritic, such as <ɔ̀> latin small letter open o with grave.

To get around this problem, I created a new Unicode TrueType font software, TuraGSM Sans Condensed, that contains the necessary accented characters as single glyphs in the Private Use Area range (U+E000 – U+F8FF). I did this by modifying DejaVu Sans Condensed, a free Unicode font, using FontForge, a free font editor application. DejaVu Sans Condensed font makes part of the DejaVu family. DejaVu fonts are TrueType fonts that contain glyphs of most Unicode characters, including all the IPA characters and diacritics used in the Tura script. I chose the Sans Condensed version of DejaVu as a basis for the Tura font because it produces particularly compact lines which normally still remain well legible even at a small font size, such as 10 points, the size that I used for the indexes and entries in the Tura mobile phone dictionary. This is an obvious advantage when rendering characters on mobile phone screens. Besides creating new single glyphs for some of the accented characters, I also had to modify a few glyphs used for the Tura script to enhance their visual distinctiveness on a mobile phone screen.

Step 1 will probably be the most time consuming part of the process of tweaking WK for your language.

Unicode and custom input methods in Wunderkammer (#1)

Wunderkammer now endorses Unicode and allows the use of custom input methods, which practically means that you can use any IPA or other non-ASCII characters in your dictionary and let the users search for words using all these characters.[1] To paraphrase, that’s a small step for Wunderkammer, one giant leap for its users. Especially, for those of them, such as myself, who are involved in languages outside of Australia and Oceania.

 

For quite some time, I’ve been working (in Toolbox and my spare time) on a dictionary of Tura (toura in French, wɛɛn /wɛ̰̀ɛ̰̀/ in Tura), a small Mande language spoken in a mountaneous region near the city of Man in the west of Ivory Coast. My Tura-French dictionary currently counts around 3000 quite well-elaborated entries, which, given the predominantly monosyllabic nature of the Tura lexicon, represents a rather complete coverage (you may have a look at it here). Eventually, I hope to make it nice and publishable as a book. In the meantime, a mobile phone version is likely to be a much more palpable outcome of the project for the community. And it will definitely be more exciting to use than a book.

Here, I’d like to share my experience in adapting Wunderkammer for Tura. If you’d like to tweak WK for another language, you will need to do the following three things:

  1. If you do not have a font that has all the characters you need as single glyphs (as was the case with the Tura script), you will need to create one yourself, e.g. using FontForge (potentially, this is rather time consuming, but otherwise not so difficult)
  2. Customize the dictionary theme file with the LWUIT resource editor (also see wksite) by adding the desired character set (that’s an easy one)
  3. Add your custom input method into the code of the MenulessTextField class in the source code of Wunderkammer, build the modified version of Wunderkammer and update the wkimport binaries (this part sounds much worse than it is in reality)

To accomplish this, you will probably need to install some additional software and look up the unicodes for the characters you wish to use. The (free) software you will need includes an IDE (integrated development envinronment) for Java, such as NetBeans, the LWUIT resource editor and a font editor application, such as FontForge. The software mentioned definitely works on Windows. I do not have experience with other systems.

 

As a result, you should be able to use wkimport to build a dictionary with all the necessary characters and the desired input method. To give you an idea of what the end result may look like, here is a demo version of the Tura-French dictionary and some screenshots (for the installation procedure see wksite). The theme image is an oil palm nut kernel, wɛ̂n in Tura, which according to folk etymology is the source for the word wɛɛn ‘Tura’. 



[1] Small print: except for complex East Asian scripts, such as Chinese, and right-to-left scripts, such as Arabic. It must be possible to make Wunderkammer work with them as well, but I’m passing on that, as these scripts are somewhat beyond my current interests.

PFED discussion board

We have a new discussion board for bug reports, suggestions for improvement, general discussion and whatever else is on the minds of WK users and others interested in endangered languages and technology.

wkimport 1.3 beta

wkimport 1.3 beta and wkimporting 1.3 beta are now available for download. The importing guide has also been updated for the new releases.

These are the anti-ant (and anti-python) releases: all the code has been implemented in pure Java, which removes the ant and python dependencies. A new high-constrast theme with a white background is also included.

The latest releases are linked from the wksite.

Wunderkammer update

It’s been awfully quiet the past couple of months here at the PFED, but today we’re releasing new versions of Wunderkammer, wkimport and the wkimporting package. Wunderkammer has been updated to work with LWUIT 1.3 and numerous bugs have been spotted and squished. We’ve also added an ‘Importing tips’ section to the importing documentation that should help future users to avoid some common problems.

Have a look at the wksite: http://pfed.info/wksite

The Wagiman Electronic Dictionary

Last week, I undertook a brief fieldtrip to Pine Creek and Kybrook Farm, Northern Territory, to present the completed Wagiman Electronic Dictionary to the Wagiman community.

It has been a long time coming as several of us have been working on this dictionary in our spare time for the last six months, and so it felt especially good to be able to see a finished product, and better yet, to give it back to the community. In that six months, we successfully integrated recent research into Wagiman plants and animal species by Glenn Wightman, as well as very recent work done by the CSIRO on fish species in the Daly River. The electronic dictionary now contains all that up-to-date information. We also managed to produce sound files for the majority of lexical entries in the dictionary. There are around 1250 sound files in the dictionary altogether, totalling some 15 minutes of high-quality audio.

Lardukkarl nganing-gin using the Wagiman mobile phone dictionary

Lardukkarl nganing-gin using the Wagiman mobile phone dictionary

The Wagiman community are very pleased with the dictionary, and all enjoyed listening to the marluga¹ who recorded each of the sounds. The Wagiman people were also excited to see the mobile phone version of the dictionary. It’s not quite as complete as the computer based dictionary; it contains far fewer sound files (around 300), and doesn’t contain the sometimes lengthy dictionary comments that accompany many lexical entries. This is an unfortunate constraint of the size of a standard mobile phone screen — too much information can be hard to navigate through.

I also met with representatives of the Northern Territory Department of Education, who were interested in supporting the dictionary and possible collaboration into the future. The Wagiman have given the tick, and the Department are going to go ahead and install the dictionary on all the computers in the schools in Katherine as a first step. We’re hoping that we’ll also be able to get the Northern Territory Library on our side and install the dictionaries on library computers. That way, most computers accessed by children and young adults in the area will have the Wagiman dictionary installed.

In addition to the computer- and mobile phone-based dictionaries, we have also been looking to produce a printed version. Hopefully the Wagiman community will be able to take advantage of the increased interest in Indigenous languages recently, and sell copies of the dictionary to tourists through various shops in Katherine, Pine Creek and Darwin.

Perhaps the most important thing to come out of this particular project is the demonstration that accessible electronic dictionaries for Indigenous languages can be produced for relatively little extra effort, provided that the language in question has been adequately described. Although for many languages, this remains a significant obstacle.

The Wagiman people have given us permission to allow the public to download a demonstration version of the Kirrkirr dictionary, which we will try to have ready soon. A full version will be available upon request to the Wagiman community.


¹Marluga, (nom.) Old man.

Wunderkammer in Canberra

I’ll be giving a presentation of the Wunderkammer software at RSPAS at the ANU in Canberra at 11 am on Friday 18 September. If you’re around, come by. Full details of the presentation can be found here.

Pfed in New York Times

James has appeared in an article in the New York Times this morning talking about mobile phone-based dictionaries.

The article focuses on endangered languages and some of the steps being taken internationally to combat language death. Here are the relevant paragraphs:

Of course, online resources are useful only to communities with Internet access. Communities without that access, like the Kim, still require books to be printed, and recordings to be copied onto CDs or tapes.

Holding more promise are programs that put electronic dictionaries on mobile phones. James McElvenny, a linguist at the University of Sydney, has led the development of software to help revitalize vanishing languages. Mr. McElvenny has been working with Aboriginal groups like the Dharug of Sydney to give learners, many of them no older than 16, a portable reference that supplies the definition and the sound of words that are otherwise no longer spoken, because Dharug is a dead language.

“A lot of the older members are technophobic,” he said, “but the kids are really getting into it.”

New version of Wunderkammer

New versions of Wunderkammer, wkimport and the wkimporting package are available from the wksite: http://www.pfed.info/wksite

We also have a spiffy new design over at the wksite!

Sydney University Linguistics Department Seminar

This is probably very short notice, but James and I will be presenting Wunderkammer, WKimport and the project in general at a Linguistics Department seminar tomorrow evening at the University of Sydney. Here are the details:

Monday 1st June
4 pm – 5.30 pm
Eastern Avenue Seminar Room 119

Wunderkammer, mobile phone dictionaries and the Wagiman electronic dictionary

James McElvenny and Aidan Wilson
The University of Sydney

ABSTRACT

In this talk we will demonstrate Wunderkammer, software that allows electronic dictionaries to be stored and displayed on mobile phones. We will show how we have used the software to produce a mobile phone dictionary of Wagiman, an Australian language from the Daly River Region in the Northern Territory. We will also discuss how other linguists can use the software to make their own electronic dictionaries available on mobile phones, as well as the future possibilities for dictionary delivery in technologically under-resourced areas.