Chinese Input Methods for non-Chinese people
Mathieu Bridon
- https://mathieu.daitauha.fr
Introduction
This document aims to introduce to non-Chinese how Chinese Input Methods work.
It can be a simple reference for developers of Operating Systems outside of China who want to enable Chinese users to input their own language.
It should be considered a work in progress and might be incomplete or contain some mistake. Feel free to contact me if you want to add/fix something.
I started it because I wanted to help GNOME developers get a better understanding of the needs of the Hong Kong community while IBus was being integrated into GNOME 3.6.
Unfortunately, there are too few FOSS developers in China, and it is hard for non-Chinese to understand how to properly implement Chinese input methods without some minimum knowledge. Hopefully this document will help fill this gap. And who knows, that might even help get more Chinese FOSS developers. :)
Written Chinese languages
There are mostly two written Chinese languages: Simplified and Traditional Chinese.
Mainland China writes in Simplified Chinese, while Hong Kong, Macau and Taiwan write in Traditional Chinese.
Types of Chinese input methods
There are two big classes of Chinese input methods. Each one will be detailed in the next two sections.
IM based on the sounds of words
A user of those will type the romanization of the Chinese character, i.e how a word in the Latin alphabet could be written to produce the sound of that character.
For example, 我 (the pronoun "I" or "me") is pronounced something like "wo". So a Pinyin user will type those two characters, "w" then "o", and one of the suggestions will be 我.
Examples of these include Pinyin and Bopomofo.
IM based on the strokes necessary to write a word
These are based on the strokes necessary to write a character.
For example, with a pen and paper, to write 三 (the number 3), one needs to:
- first write the first stroke: ,
- then add the second one: ,
- and finish with the last stroke:
This is much like when writing a "p", one would start by "drawing" the vertical bar, then add the round part.
In a stroke-based input method, each type of stroke (vertical, horizontal, curved, ...) is associated to a character of the latin alphabet on the keyboard.
And then one has to type in the right order the series of characters corresponding to the series of strokes necessary to write the full character.
In the above (trivial) example of the number 3, the horizontal stroke corresponds to the "m" key in the Cangjie (version 3) input method. So to input the number 3, the user would have to press three times the "m" key.
Cangjie, Quick or hand-writing (either with pen and paper or with a touch screen device) are all examples of stroke-based input methods.
Most used Chinese input methods
Note
This document was written and reviewed primarily by people in Hong Kong.
If we made any mistake for the other regions using Chinese input methods, please let us know.
The most used Chinese input methods are the following:
- Pinyin is a sound-based input method. It is used mostly in Mainland China, to input Simplified Chinese.
- Bopomofo is a sound-based input method. It is used mostly in Taiwan, to input Traditional Chinese.
- Cangjie is a stroke-based input method. It is used mostly in Hong Kong, to input Traditional Chinese.
- Quick is a stroke-based input method. It is used mostly in Hong Kong, to input Traditional Chinese. Note that Quick is based on Cangjie.
- Hand-writing is effectively a stroke-based input method. It is used everywhere people write on a piece of paper, or on a touch screen, to input any Chinese language.
The situation in Hong Kong
Cangjie and Quick
Cangjie is a very classic stroke-based input method, as explained above. Every word is represented by a combination of up to 5 keys.
Quick is based on Cangjie, with a simple change to make it easier and reduce the number of keys needed before getting suggestions to only 2: the user only types the first and last key, corresponding to the first and last stroke in Cangjie.
Multiple languages
Cangjie (and Quick as it is based on Cangjie) were designed to input the characters of 3 different languages:
- Traditional Chinese
- Simplified Chinese
- Japanese
Its design is clever enough to limit "collisions" (i.e a given combination of up to 5 keys returning multiple candidates) to a minimum. When collisions happen though, it will usually be limited to rarely used characters or slow to type combinations.
As such, most of the time, a Cangjie user will only be presented with candidates in the language he is expecting based on his input. (unless he is not using the version he thinks he is)
Different versions
The Cangjie input method (not its implementation in a given Operating System) was first published in 1976.
Since then, a few different versions have been published, each slightly incompatible with each other.
For example, the word "面" (face, surface) will be inputted differently in each version:
- "mwyl" in Cangjie 3
- "mwsl" in Cangjie 5
These incompatibilities mean that users will have to spend some time learning a new version, almost as if it were a different input method.
Schools and education
Schools teach Cangjie version 3.
This has a lot to do with inertia: schools teach Cangjie 3 because it is the default on Microsoft Windows, which in turn defaults to version 3 because it's what is taught at school.
What people use
After learning at school, most people will move from Cangjie to Quick.
This is because the former has a much steeper learning curve than the latter, which is much easier to use.
However, many people stick to Cangjie because, once they have made the effort to learn it properly, it allows them to type much faster.
In any case, the overwhelming majority uses version 3 of their input method of choice, with the rest mostly using version 5.
Stroke 5 for a11y
Stroke 5 is an input method which was created for the elderly and people with reduced hand mobility.
It is stroke-based, just like Cangjie and Quick.
However, to allow typing with few fingers and with relatively few movements, only 5 keys are used (from a US keyboard layout) :
- "n" for the "curved" strokes
- "m" for the "left to right horizontal" strokes
- "," for the "right-to to left-bottom diagonal" strokes
- "." for the "left-top to right-bottom" strokes (and punctuation marks)
- "/" for the "top to bottom vertical" strokes
So for example, to write the word 中 ("middle"), one must first write the leftmost vertical stroke, then the top horizontal line and the rightmost vertical line as one stroke, then the bottom horizontal stroke, and finally the long middle vertical stroke.
As such, a user of the Stroke 5 input method would input the "/nm/" combination of keys.
In Hong Kong, some groups are showing tremendous results with Stroke 5, giving access to electronic devices and the Internet to people who traditionally couldn't input their own language on a keyboard before.
Implementations on some popular OSes
Windows
Microsoft Windows provides both Cangjie and Quick, both in version 3.
Note
Microsoft Windows is used by virtually everybody in Hong Kong, both at home, at school and at work.
Since Windows 7, it offers to optionally enable the results of version 5. But that is in addition to the results of version 3.
Mac OS X
Mac OS X provides Cangjie and Quick, in a version that is « somewhat like Version 3 and somewhat like Version 5. » [Wikipedia]
Many Mac users of Cangjie in Hong Kong will install the Yahoo input method framework instead of using the default system one, as it allows them to use Cangjie 3 as they are used to.
Quick users tend to not bother. This is because, given the design of Quick, very few things changed between versions 3 and 5.
GNOME
Note
This is pretty much a work in progress at the moment, hopefully things should land with GNOME 3.8.
GNOME uses IBus as its Input Method Framework.
IBus provides implementations of Cangjie and Quick through IBus Cangjie, and of Stroke 5 through IBus Table.
For both Cangjie and Quick, versions 3 and 5 are available, and version 3 is the default.
Authors
This document was written by Mathieu Bridon (bochecha). You can contact me for any remark, or to suggest a correction to any mistake.
I have to thank Wan Leung Wong for his patience and the time he took to explain all these things to me. This document wouldn't exist without him.
This document is distributed under the Creative Commons Attribution Share-Alike 3.0 Unported license (CC-By-SA).