Unicode Ge`ez

Developing Multi-Lingual Computer Utilities for Eritrea
by rvb

Free Standard Compliant Ge'ez Software !!!

Our ISO/IEC 10646-1 standard compliant keyboard software for Windows 95, 98, 2000, and and Windows NT is now available at this link for UniGeez 1.06 Please save the zip file to disk, and unzip. The executable is in the RunMe! directory and is called UniGeez. UniGeez1.06 is the stable well-tested version, but there are other experimental versions with expanded features! See version history below:

You must have the GF Zemen Unicode font installed for the software to work. And in many applications you must set the font to GF Zemen Unicode to see the Geez characters. This nearly final version of the software adds several of the features recommended in earlier test runs including immediate visual output of keystrokes, control key functionality in Word, and font selection capabilities for those applications where the characters are input using rich text format (RTF is used in Word for example). The software works with Netscape Navigator and Composer in all of the Windows operating systems since the Netscape applications reliably comply with the multi-lingual encoding standards.

Version History

  1. UniGeez 0.9 Basic alpha-release test software. (Oct. 2000)
  2. UniGeez 1.02 Pre-release version software, corrects bugs and responds to comments regarding the alpha version. (Nov. 2000)
  3. UniGeez 1.04 Release software, adds functionality in Excel, and corrects some minor bugs. (Dec. 2000)
  4. UniGeez 1.05 Improved version, better look, uses registry for program info, improves Excel functionality, and adds option to add fonts in addition to default font. (Jan. 2001)
  5. UniGeez 1.06 Improves the functionality of the software in Windows 95. (Feb. 2001)
  6. UniGeez 2.001 UniGeez with conversion utility! Allows users to convert Geez text and documents between Unicode, Geezgate, Yada, and SERA (Aug. 2001)
  7. UniGeez 3.002 An alpha version with Arabic functionality is under test. This version is buggy, but shows how Arabic will be included. Keyboard mapping is the Qalam transliteration convention. (Sept. 2001)

Known Bugs/Problems

  1. The 'Du' character, \U12F9 is not represented in the GF Zemen Unicode font. (This problem was corrected February 5, 2001, please download the corrected font.)
  2. The software does not work with WordArt in MS Word. This cannot be corrected until Microsoft corrects WordArt to support unicode.
  3. The software does not work with Adobe Pagemaker and several other Adobe publishing products. These Adobe products do not support the Unicode standard! These products require fonts that are mapped onto western encodings. By June 2001, UniGeez will have a utility for converting between unicode text and selected non-standard encoding fonts. This will enable UniGeez users to produce text that can be pasted into such applications.

Motivation

Making Eritreans more comfortable with computers and computer-based information will be a crucial element of bringing the benefits of the Internet and computers to the average Eritrean.

Limited access to technology (and the enhanced productivity it brings) is one of the main barriers to raising the standard of living and the value of national economic production in Eritrea. The ease of computer access, and the relevance of computer information will be a major element of rapidly transferring and applying computer technologies to Eritrea's economic and productive activities.

The for-profit private sector development model for multi-lingual computer infrastructure has failed Eritrea. First of all there are very few multi-lingual software providers and developers in Eritrea (there are approximately two), and the software that has been developed is expensive, uses non-standard character encodings, and is mutually incompatible with other software in Eritrea and Ethiopia. Prices for existing software range from $20 to $90 per copy and there were no free versions of the software until UniGeez began free distribution. This had lead to the contradiction that English-speaking people can use computers in their own language for free, while Tigrigna speakers (who have a mean per-capital of $250/year) may have to pay $90 to use a computer in their own language.

The large public benefits of multi-lingual computer access means that basic utilities that provide easy multi-lingual computer access should be public, rather than private property. A person trained and proficient in computers has an earning potential of perhaps 2-10 times that of a person without computer training. This means that if the existence of free public multi-lingual computer utilities facilitate computer access to just 1000 more people, the national economic benefit will be at least 1000 people*$1000/yr = $1 million/year. This justifies significant pulbic sector investment in the development of basic multi-lingual computer infrastructure.

To this end the Eritrea Technical Exchange Project of the International Collaborative for Science Education and the Environment (ETEP/ICSEE) has a project to develop and enhance the basic multi-lingual Ge`ez (g'Iz) and Arabic computer facilities and infrastructure in Eritrea. All multi-lingual utilities developed by ETEP will be public property software distributed under the GNU public license.

Technical Issues

There are several technical issues that will be important for establishing an efficient multi-lingual computer communications infrastructure in Eritrea.

The most important technical issue is how to consistently encode or represent Ge`ez and Arabic text strings and formatted documents. Fortunately this problem has been largely solved already through the international standard-setting process. There are a set of standards commonly referred to as the Unicode standards, or more technically known as ISO/IEC 10646-1. These standards describe with technical specificity how to encode characters of most of the world's languages, including the Ge`ez syllabary. These standards include "Ethiopic" in Amendment 10, which even though it is mis-named includes all letters of the Ge`ez syllabary. Details of the Unicode standards are available at the Unicode Home Page.

But the setting of an international standard for the encoding of Ge`ez is just a first step. Once the encoding standard is set, fonts that comply with the standard need to be designed and computer software need to be developed that allows users to create computerized information content that complies with the standard. To date, there is only on unicode-compliant true type Ge`ez font ( GF Zemen Unicode) though there are several unicode compliant Unix/Linux fonts. And until recently there has not been any Windows keyboard software that complies with the international encoding standards. As a result, there are about 70 mutually incompatible Ge`ez encodings in use in both Eritrea and Ethiopia.

Another rather large task for enabling an efficient multi-lingual infrastructure is providing configuration and software modification support so that common applications can and do utilize and display standard compliant Ge`ez documents and data. There remains a lot of work to be done in this area with regards to graphic design and database software.

Why Standard Compliance?

The efficiency and productivity of computer communications depends directly on the speed and cost of transfering information from one person or application to another. Currently, in the U.S. the largest amount of time is spent in gettting information is not the network transfer, but locating and reading the information. Perhaps it takes 15 seconds to go to a search engine, a minute to find the page in the search engine results, and another minute to read the information. If documents are not prepared in a consistent encoding and format, then an other step may have to be taken to read or use information that is in Ge`ez because of the need to convert or translate the character encoding between formats. That step, even if it takes less than a minute, can increase the time of presenting or retrieving information by up to 30%. In addition, developers and content providers would have to spend extra resources to provide translation and conversion facilities for different types of Ge`ez. Even worse, after standards become dominant, the cumulative archives of non-standard Ge`ez documents will have to be converted to be useful. A reasonable estimate of the costs of conversion and conversion support in a computer communications environment without standards is about 10% of computer communications activity.

Current computer communications markets in Eritrea are running at more than $200,000 per year and doubling at about 100%/year. The computer services sector is probably 5 to ten times this amount. This means that the cost of non-compliance with standards can be tens to hundreds of thousands of dollars per year in the near future. Mostly this cost is reflected in the lost opportunities of people using English letters and text when they would be much happier and effective in using Ge`ez or Arabic if it was convenient and readily available.

Specifications of Unicode Ge`ez Keyboard Software

In this section we describe the technical specifications of the public property standard compliant Ge`ez software for Windows. The software is referred to as "the package." The software is schedule for its version 1.0 release in October 2000. The software was written and developed by Marcus Wright and Will Briggs of the Lynchburg College Computer Science Department.

1. The package runs in Microsoft Windows 95 or later including Windows 98, Windows NT and Windows 2000.

2. The package runs in the background, so that it can intercept keyboard input and converts it, before it gets to the active program, to Ge`ez script.

3. There should be two modes: Ge`ez and Roman. When in Roman, the input is transferred directly to the output, unchanged. When in Ge`ez, the output of the package is unicode representations of Ge`ez characters in UTF-8 encoding scheme as specified by ISO/IEC 10646-1 Amendment 10 and ISO/IEC 10646-1 Amendment 2 respectively. The key mapping for the Ge`ez characters will be modifiable either though a configuration file, or a configuration table in the software source code. There is the possibility of adding a third mode to similarly accomodate UTF-8 encoded Arabic as specified by the ISO/IEC 10646-1 standard.

The ISO/IEC 10646-1 compliant UTF-8 encoded font that will be used for testing of the Ge`ez mode is GF Zemen Unicode. Available at:

ftp://ftp.ethiopic.org/pub/fonts/TrueType/gfzemenu.ttf

The character charts for ISO/IEC 10646-1 Amendment 10 can be found at:

http://www.unicode.org/charts/PDF/U1200.pdf

A description of the UTF-8 encoding scheme (ISO/IEC 10646-1 Amendment 2) is be available at:

http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/n1335

A technical discussion for character encoding schemes can be found at:

http://www.unicode.org/unicode/reports/tr17/

The character charts for the Arabic presentation forms can be similarly found at:

http://www.unicode.org/charts/PDF/UFB50.pdf

4. A syllable should be made available to the active program as soon as it's ready; it's ready when the next character not in the syllable is entered. For example, in the string "hama," once the "m" is typed, the "ha" is ready to go, as "ham" is not part of any syllable.

To accomodate languages like Arabic, the syllable made available to the program may consist of strings of more than one UTF-8 character. For example for initial character forms in Arabic one types a space/character combination to release a space/translated_character combination to the program because the character takes different glyph depending on if it is an initial, medial, or final form. Similarly for final forms a character/space releases a translated_character/space to the active program. The mapping of syllables to UTF-8 characters may be many-to-one as specified in the configuration file.

5. Source code is part of the package, and it will be released using the GNU licensing agreement. Part of this source code is a configuration file for the coding standard used by Ge`ezFree (which is Unicode with UTF-8 encoding scheme and SERA transliteration standard); changing this configuration file would allow the use of other fonts, or even other languages that have similar needs for conversion (like Arabic).

In addition to the character table provided at unicode.org, an additional character table is provided at:

http://enh.ethiopiaonline.net/info/Fidel.ixbm.html

The initial transliteration standard will be the "System for Ethiopic Representation in ASCII" or (SERA) as specified at:

http://www.abyssiniacybergateway.net/fidel/sera-faq_0.html

or as specified by the corresponding unicode values at:

http://www.punchdown.org/rvb/email/sera.html

6. For testing: The packed will be tested on Microsoft Windows 95, 98, and 2000, with MS Office 97 and 2000. It will also be checked with various network and Internet programs such as Netscape Communicator, Internet Explorer, MS Frontpage and MS Outlook. The tests will ensure that the program does not crash and does look Ethiopian. The Eritrea Technical Exchange will take responsibility for final testing and quality assurance for the software performance and ability to produce valid/legible Ge`ez text.

Other Utilities and Links

In addition to the multi-lingual utilities recently developed by ETEP/ICSEE, there are also several resources for standard compliant Ge`ez developed by the LibETH project and the 'AbyssiniaCyberGateway' sites. Also of general interest are side on Unicode standards, font software and other language encodings and transliteration methods. These include:



last updated December 2000 by