Archive for the ‘Languages’ Category.

Umpila/Kuuku Ya’u Dictionary (V.2)

Patrick Butcher, an artist working at the Lockhart River Art Centre, with an in-progress artwork in the background. Commenting on the dictionary: “…more access to language is vital. This is easy to use, the words are just there. It’ll make it easier to make art work names. At the moment I have to search out one of the elders or a copy of the blue book [a sketch grammar containing a wordlist] now help is just in my pocket.”

In the last month I have been in Lockhart River, on the beautiful east coast of the Cape in northern QLD. It was great to be back there, to see my Lockhart friends and to continue work with Umpila and Kuuku Ya’u speakers. Part of my time at the start of my trip was spent creating a second version of the Umpila/Kuuku Ya’u – English Wunderkammer dictionary. I started work on this dictionary in a fieldtrip to Lockhart a year back, recording sound clips with speakers and some illustrative examples too (for a computer/book based version of dictionary), and then earlier this year I put together the first version. See the last two posts from David Thompson and myself which has a little info on the development and distribution of this.

This trip to Lockhart I wanted to vamp up the Wunderkammer dictionary a tad more and also continue getting feedback from users, following-on from David’s April trip to Lockhart. For Version 2, speakers and I recorded more sound clips and re-recorded some clips with better quality – invariably there are some instances where even with three or so productions of the word they all end up having some type of background noise that you didn’t quite notice or didn’t think would get caught by the mic, e.g. branches scratching on the roof, a dog scrap two houses back. The clips edited and prepped for Version 2 take the count to around 200 entries with sound. I also worked with speakers to add a more vernacular flavour to some of the definitions. We particularly worked on plant, animal and material culture vocab. In the first version of the dictionary definitions for this type of vocab were often pretty uninformative, e.g. simple entries like “tree species” or “spear type”. Even where I have scientific idents for plants and animals I wasn’t keen to use them given target audience “youth” user audience. I wanted to add a little more to the definitions of these types of words, something that captured the sorts of identifying features and descriptions that Lockhart people themselves use. So, speakers and I worked to record vernacular definitions for around 100 of the entries. Here are some examples: iinyi ‘cicada, “noisy minya” (nyanguru kuupi) which sings out, especially at dusk’; kanhin ‘trochus shell, in old mission times men used to dive for kanhin to sell to the Chinese and Japanese people’; thakura ‘yam type, hairy “New Guinea” yam, plenty grow on islands, like on Restoration Island’. As you can see these include some local knowledge of places and history, which wouldn’t usually be good lexicographic practice, but seeing as this dictionary has a targeted audience this sort of local flavour is a big plus. Given the limitations of the mobile phone screen “real-estate” we had to make these definitions not too long, but I think we did pretty well. The “full meanings” were one of things people commented on very positively in feedback. Note: Sometimes the Creole vernacular of speakers was somewhat modified into Standard English variety in these definitions. In latter versions we are considering adding Creole into the dictionary in some way. Perhaps not making a trilingual dictionary per se, but including Creole translations of the some of the words, or notation of the traditional language vocab which has been absorbed into the Creole, or a designated Creole vernacular definition field. So, lots more ideas to make this a cooler and more enriched little dictionary in coming versions!

As I went about Lockhart distributing and showing-off Version 2 I was keen to get some feedback back from users. I was particularly interested to see how users first interacted with and familarised themselves with the interface of the dictionary, and if they found it user-friendly. People were overwhelmingly positive about it. Everyone I talked with thought it was perfectly targeted at teenagers and twenty-somethings, including themselves. For this demographic, like elsewhere in Oz, a mobile phone is their main entertainment device, media player and song storage device, games console. There are few language resources available for Umpila and Kuuku Ya’u, and even fewer that would be captivating to this audience. So, that is a big positive for this platform.

As far as modifications go there were very few suggestions for changes, users were happy with the size and form of it, in terms of entries and file size (3.4MB). There were some people who suggested more content of specific types, based on their own usage ideas and needs, e.g. a teacher’s aid suggested including example sentences and a land and sea ranger suggested more detailed plant and animal information including photos. There will be more information on technical hurdles in distribution and user feedback in review/paper to come – more on this by-and-by.

Unicode and custom input methods in Wunderkammer (#4: Custom input method)

Step 3: Custom input method.

First of all, you will need to add your custom input method into the code of the MenulessTextField class in the source code of Wunderkammer. This can be done in any text editor, but as you will need to compile the whole thing later anyway, its probably better to do everything in NetBeans or another Java IDE right from the beginning. You can find the whole MenulessTextField class with the custom Tura-French input method here.

There are only a few things that you are likely to need to modify. Thus, you will probably want to customize the mappings for low case (see picture) and upper case letters onto a typical 3×4 mobile phone keypad. The <\uXXXX> sequences are the unicodes for every special character (I added the characters themselves as comments in green to make things clear).

The low case mappings of the Tura custom input method

Furthermore, you might wish to modify the part “\ue003\ue004\ue005” in brackets after TextField.addInputMode and TextField.setDefaultInputModeOrder, which stands for <ɛ̀ɛ̂ɛ́>. This is just the way I chose to indicate the low case Tura custom input method at the right end of the search box in WK, similar to “Abc” or “123”.

When you are done with adding your custom input method, build the WK project in NetBeans (there is a button for this in NetBeans). To update the WK binaries in wkimporting, all you need to do is replace all the files in the directory wkimporting/bundle/build/preverified with all the preverified binaries from the project. So build the project and then go to build/preverified and copy everything except the ‘META-INF’ and ‘res’ directories to wkimporting/bundle/build/preverified.

Basically, that’s it. Now, after you import the dictionary into WK, you should get a dictionary with the custom input method you defined.

Unicode and custom input methods in Wunderkammer (#3: Theme file)

Step 2: Dictionary theme file.

You can customize the dictionary theme file with the LWUIT resource editor in many ways. For instance, you can create a bitmap font and include it into the theme file. The procedure is rather straightforward. You select a font installed on your system, its size and style, anti-aliasing method and define a character set that needs to be included in the bitmap font. For example, the character set of TuraGSM Sans Condensed I used is reproduced below. Note that the last character in the charset is a space (U+0020).

TuraGSM Sans Condensed charset for the Tura dictionary theme file

TuraGSM Sans Condensed charset for the Tura dictionary theme file

Unicode and custom input methods in Wunderkammer (#2: Font)

Step 1: Font

You need to have a font with all the necessary characters as single glyphs. The reason is that the dictionary theme file in LWUIT (step 2) uses a bitmap font which cannot use combining characters (LWUIT creates such a font for you from any font available on your system). Thus, the character <à>, latin small letter a with grave, must be represented by a glyph with unicode point U+00E0 and not as a combination of two glyphs, viz. one with U+0061 for <a> and one with U+0300 for the combining grave accent. This single glyph requirement was problematic for the Tura script because it contains a range of characters, viz. some IPA characters with tone diacritics, that Unicode fonts render only by means of a combination of a glyph for the respective letter with a combining glyph for the diacritic, such as <ɔ̀> latin small letter open o with grave.

To get around this problem, I created a new Unicode TrueType font software, TuraGSM Sans Condensed, that contains the necessary accented characters as single glyphs in the Private Use Area range (U+E000 – U+F8FF). I did this by modifying DejaVu Sans Condensed, a free Unicode font, using FontForge, a free font editor application. DejaVu Sans Condensed font makes part of the DejaVu family. DejaVu fonts are TrueType fonts that contain glyphs of most Unicode characters, including all the IPA characters and diacritics used in the Tura script. I chose the Sans Condensed version of DejaVu as a basis for the Tura font because it produces particularly compact lines which normally still remain well legible even at a small font size, such as 10 points, the size that I used for the indexes and entries in the Tura mobile phone dictionary. This is an obvious advantage when rendering characters on mobile phone screens. Besides creating new single glyphs for some of the accented characters, I also had to modify a few glyphs used for the Tura script to enhance their visual distinctiveness on a mobile phone screen.

Step 1 will probably be the most time consuming part of the process of tweaking WK for your language.

Unicode and custom input methods in Wunderkammer (#1)

Wunderkammer now endorses Unicode and allows the use of custom input methods, which practically means that you can use any IPA or other non-ASCII characters in your dictionary and let the users search for words using all these characters.[1] To paraphrase, that’s a small step for Wunderkammer, one giant leap for its users. Especially, for those of them, such as myself, who are involved in languages outside of Australia and Oceania.


For quite some time, I’ve been working (in Toolbox and my spare time) on a dictionary of Tura (toura in French, wɛɛn /wɛ̰̀ɛ̰̀/ in Tura), a small Mande language spoken in a mountaneous region near the city of Man in the west of Ivory Coast. My Tura-French dictionary currently counts around 3000 quite well-elaborated entries, which, given the predominantly monosyllabic nature of the Tura lexicon, represents a rather complete coverage (you may have a look at it here). Eventually, I hope to make it nice and publishable as a book. In the meantime, a mobile phone version is likely to be a much more palpable outcome of the project for the community. And it will definitely be more exciting to use than a book.

Here, I’d like to share my experience in adapting Wunderkammer for Tura. If you’d like to tweak WK for another language, you will need to do the following three things:

  1. If you do not have a font that has all the characters you need as single glyphs (as was the case with the Tura script), you will need to create one yourself, e.g. using FontForge (potentially, this is rather time consuming, but otherwise not so difficult)
  2. Customize the dictionary theme file with the LWUIT resource editor (also see wksite) by adding the desired character set (that’s an easy one)
  3. Add your custom input method into the code of the MenulessTextField class in the source code of Wunderkammer, build the modified version of Wunderkammer and update the wkimport binaries (this part sounds much worse than it is in reality)

To accomplish this, you will probably need to install some additional software and look up the unicodes for the characters you wish to use. The (free) software you will need includes an IDE (integrated development envinronment) for Java, such as NetBeans, the LWUIT resource editor and a font editor application, such as FontForge. The software mentioned definitely works on Windows. I do not have experience with other systems.


As a result, you should be able to use wkimport to build a dictionary with all the necessary characters and the desired input method. To give you an idea of what the end result may look like, here is a demo version of the Tura-French dictionary and some screenshots (for the installation procedure see wksite). The theme image is an oil palm nut kernel, wɛ̂n in Tura, which according to folk etymology is the source for the word wɛɛn ‘Tura’. 

[1] Small print: except for complex East Asian scripts, such as Chinese, and right-to-left scripts, such as Arabic. It must be possible to make Wunderkammer work with them as well, but I’m passing on that, as these scripts are somewhat beyond my current interests.

The Wagiman Electronic Dictionary

Last week, I undertook a brief fieldtrip to Pine Creek and Kybrook Farm, Northern Territory, to present the completed Wagiman Electronic Dictionary to the Wagiman community.

It has been a long time coming as several of us have been working on this dictionary in our spare time for the last six months, and so it felt especially good to be able to see a finished product, and better yet, to give it back to the community. In that six months, we successfully integrated recent research into Wagiman plants and animal species by Glenn Wightman, as well as very recent work done by the CSIRO on fish species in the Daly River. The electronic dictionary now contains all that up-to-date information. We also managed to produce sound files for the majority of lexical entries in the dictionary. There are around 1250 sound files in the dictionary altogether, totalling some 15 minutes of high-quality audio.

Lardukkarl nganing-gin using the Wagiman mobile phone dictionary

Lardukkarl nganing-gin using the Wagiman mobile phone dictionary

The Wagiman community are very pleased with the dictionary, and all enjoyed listening to the marluga¹ who recorded each of the sounds. The Wagiman people were also excited to see the mobile phone version of the dictionary. It’s not quite as complete as the computer based dictionary; it contains far fewer sound files (around 300), and doesn’t contain the sometimes lengthy dictionary comments that accompany many lexical entries. This is an unfortunate constraint of the size of a standard mobile phone screen — too much information can be hard to navigate through.

I also met with representatives of the Northern Territory Department of Education, who were interested in supporting the dictionary and possible collaboration into the future. The Wagiman have given the tick, and the Department are going to go ahead and install the dictionary on all the computers in the schools in Katherine as a first step. We’re hoping that we’ll also be able to get the Northern Territory Library on our side and install the dictionaries on library computers. That way, most computers accessed by children and young adults in the area will have the Wagiman dictionary installed.

In addition to the computer- and mobile phone-based dictionaries, we have also been looking to produce a printed version. Hopefully the Wagiman community will be able to take advantage of the increased interest in Indigenous languages recently, and sell copies of the dictionary to tourists through various shops in Katherine, Pine Creek and Darwin.

Perhaps the most important thing to come out of this particular project is the demonstration that accessible electronic dictionaries for Indigenous languages can be produced for relatively little extra effort, provided that the language in question has been adequately described. Although for many languages, this remains a significant obstacle.

The Wagiman people have given us permission to allow the public to download a demonstration version of the Kirrkirr dictionary, which we will try to have ready soon. A full version will be available upon request to the Wagiman community.

¹Marluga, (nom.) Old man.

Sydney University Linguistics Department Seminar

This is probably very short notice, but James and I will be presenting Wunderkammer, WKimport and the project in general at a Linguistics Department seminar tomorrow evening at the University of Sydney. Here are the details:

Monday 1st June
4 pm – 5.30 pm
Eastern Avenue Seminar Room 119

Wunderkammer, mobile phone dictionaries and the Wagiman electronic dictionary

James McElvenny and Aidan Wilson
The University of Sydney


In this talk we will demonstrate Wunderkammer, software that allows electronic dictionaries to be stored and displayed on mobile phones. We will show how we have used the software to produce a mobile phone dictionary of Wagiman, an Australian language from the Daly River Region in the Northern Territory. We will also discuss how other linguists can use the software to make their own electronic dictionaries available on mobile phones, as well as the future possibilities for dictionary delivery in technologically under-resourced areas.

Slowly but surely

For the past couple of weeks I’ve been working my way through my several hours of Wagiman recordings from my recent fieldtrip, all the time remarking at how excellent they are. It’s a combination of a good recording device; a Roland Edirol R-4, a great microphone with a proven track record in the field; a Røde NT41, and experience in microphone placement and input gain control2. I’m finding the best tokens of all the words I recorded for eventual insertion into the electronic versions of the Wagiman dictionary, including a Kirrkirr instance, and a mobile phone dictionary.

Splitting the recordings into some 1500 individual sound files is a time-consuming occupation, and unfortunately, as it’s the only one of my many jobs that isn’t actually paying me anything, higher priority tasks often win out.

Eventually though, we’ll have a Wagiman electronic dictionary ready for distribution, and a down-sampled version of the same ready for installation on mobile phones. So keep posted!

[Cross-posted at matjjin-nehen]

  1. Both of which were loaned from PARADISEC. []
  2. Gain control was really key in the end, as it was raining most of the time,which would cause low-level hiss if the gain were set too high. Luckily my speaker didn’t mind talking directly and loudly into the microphone, so I was able to keep the gain right down to stop too much ambient noise getting in. []

Wagiman and Dalabon Dictionaries

I’ve been in the Territory for a week now, mainly working on the content of the Wagiman Electronic Dictionary and recording sound files for it. But I’ve also been canvassing interest in the dictionary from the members of the community. Thus far the reception seems positive; the school-age children were keen to see it and give it a go, and the adults agree that using mobile phones is a great way of getting their kids to learn such information.

Last weekend I was in Katherine seeing some linguist friends, and while I was there I had an informal meeting with someone from the Northern Territory Department of Education, and they too were interested in the mobile phone dictionary as well as the Kirrkirr dictionary, so much in fact, that we’ve undertaken the job of producing a dictionary of Dalabon – more on that later. I foresee that, if this takes off, the Kirrkirr dictionary would be used at school, either on the ubiquitous Smartboards with the whole class, or individuals working at computers with the software loaded, but it would be used in addition on the students’ mobile phones in their own time, perhaps to complete set homework tasks.

Even if four hours per day of education in the Northern Territory is mandated to being English-only, thanks to Marion Scrymgour, language education teachers could capitalise on the remaining one hour of tuition per day by using an interactive and visually rich program such as Kirrkirr.

I had the brainwave, while I was showing off the software in Katherine, that the dictionary need not translate between Wagiman and English on its own, but it could also contain the Kriol translations of Wagiman words. It would then be a matter of selecting a different stylesheet within Kirrkirr to display either the Wagiman-English or the Wagiman-Kriol version. It would take a bit of work on my part to translate the entire dictionary into Kriol, especially since it’s a language that I don’t speak, but with help from someone it could quite easily be done. Wamut has already helped me with the first bit of the dictionary to undergo trilingual representation, with the entry for Ngal-martdiwa ‘Old woman’ within the semantic domain ‘Human classification’; this now becomes ‘Olgaman’ in the semantic domain ‘Ola wed bla pipul’. Wamut also pointed out that Kriol would map much more closely to Wagiman than English would, most obviously in areas such as kinship terms (dedi versus uncle), semantically ambiguous verb meanings (hit versus kill) and free pronouns (melabat versus us).

It wouldn’t be a complicated computation task either; the MDF specifications for Shoebox/Toolbox already support multiple content languages, using codes such as \ge for ‘gloss English’ as opposed to \gn ‘gloss national’ which could be taken to mean Lingua Franca. In an XML formatted dictionary, it would simply mean adding another XML chunk for definition, and call it something like <MeaningsKriol>.

I mentioned the possibility of a Dalabon dictionary earlier, so I might explain what’s going on. The representative from the NT Education Department has a potential Dalabon project in the pipeline, and having a visual dictionary paired with a mobile phone version of the same would be immensely helpful for that effort. I’ve contacted all the authors and have received permission to go ahead with this. With any luck, the current dictionary (a backslash coded text file) is complete and consistent and won’t require any actual work on our part, besides configuring the file that translates from backslash codes to XML. But that’s James’ domain. Finding sound files and inserting them should be¹ the only complicated task, but this we’ll have to leave to one of the linguists working with Dalabon. For now though, a user-friendly, visual version of what the three authors painstakingly produced would be a great way for Dalabon kids to engage with their language again.

¹­Yeah, ‘should be’, but of course things always go wrong.

Wunderkammer and wkimport 1.0

Wunderkammer and wkimport are now available. Wunderkammer can be used to display dictionaries on mobile phones and wkimport can be used to import electronic dictionaries in a variety of formats into Wunderkammer. See the wksite for downloads and documentation. There is also an online demo so you don’t need to download the MIDlet to your phone to see it working (although the online demo lacks sound).