[MUD-Dev] Phonetic and Ideographic languages (was Hangul)
Nathan F. Yospe
yospe at kanga.nu
Wed Jul 31 20:08:34 CEST 2002
Travis Casey <efindel at earthlink.net> said:
> On Thursday 25 July 2002 3:01, Damion Schubert wrote:
>> I don't know exactly what the language font for Korean is, but if
>> you go into any Korean Lineage server, you'll see a lot of
>> symbols above the heads of characters which aren't latin
>> Characters by any stretch of the imagination.
> The major writing system in Korea is called Hangul. It is a
> syllabary -- each symbol represents a particular syllable. This
> means it has a lot more characters than English (which builds
> syllables from multiple characters), but nowhere near as many as
> traditional Chinese writing, in which a symbol corresponds to a
> particular word.
> Hangul has exactly 140 characters. Each of these is a consonant +
> vowel combination. There are also separate symbols for the
> consonants (of which there are 14) and the vowels (of which there
> are 10).
The unicode standard provides, for Hangul Jamo (the consonant and
vowels combined to make the 10K plus Hangul) 96 noncombinable, of
which only 51 are in use as modern letters, and what's called the
Hangul Jamo combining alphabet, with 60 (0x1100 - 0x1159) initial
consonants, 66 (0x1161 - 0x11A2) vowels, and 82 (0x11A8 - 0x11F9)
final consonants...
The standard states that the compatibility section is the portion
of Hangul Jamo that cannot be algorithmicly decomposed to normal,
standardized Jamo.
Of the Hangul Jamo letters, 19 leading, 27 trailing, and 21 vowel
characters are listed as being in common modern use. That's much
more than 14 consonants and 10 vowels, and results in a total set
of 11,172 complete Hangul in modern use. That's not counting the
Hangul produced by the less standard 51 Jamo or the degenerate or
unused products involving one of the remaining other 140 odd Jamo
in the normal block. The Johab set (the 11K formed sylables) are
unquestionably going to be typed using combined keystrokes. With
a shift or control key, I'm sure all modern Jamo could easilly be
typed with one stroke. It may even be possible to automate parts
of each Hangul in sequence - initial jamo, vowel jamo, and final,
if present, spacebar (maybe) to move on if not. I don't know how
it's actually done, but that would seem a sensible approach...
What you describe sounds closer to Hiragana, which has (really) a
total of fourteen consonants (five accessed only by accents added
to similar sounding consonants, eg ka -> ga, ha -> ba -> pa), but
only five vowels, with a few others formed by combining vowels to
get long forms ("yaw" would be ia -> "eee" + "ah") and a few sets
of combining sounds are neglected (y is only "yah", "yoo", "yoh";
w is only "wah, "woh"; tsu replaces "too", chi replaces "tee", ji
replaces "zee", and there is a vowelless "n" that doesn't start a
new word, but can end one), and the vowels exist as unadorned but
seperate characters. There's also a rare case of a plain u "ooh"
getting accented to become a vu "vooh", but I've only seen it one
or two times, ever.
Mind, there's also a complete parallel alphabet, Katakana, with a
slightly different, less fluid shape, used for foreign words, and
a majority of Nihongo is written in Kanji, which is analogous to,
and ultimately derived from, Han ideographs, the chinese writing.
Clasical writing is in Kanzi, which retains archaic flourishes, a
bit like gothic scripts in english.
Korean was, until sometime in the last century, also written with
a Han derived ideographic form. Several of the nonstandard Jamo,
ultimately, reflect that root. Purely Han derived characters are
called Hanja and look just like Han.
Again, I don't know how that would be handled in Korean typing.
What gets particularly complex is the integration of dictation in
a combined phonetic/ideographic typed environ. Worse still, when
dealing with Han (pretty much the last pure ideographic language)
there are actually multiple spoken languages mapped to one print,
and the only phonetic standards format is something refered to as
BoPoMoFo. It isn't used in writing, but does provide for a basic
standard mapping from phonetic (keep in mind that this includes a
set of tonal elements used for differentiation of meaning) to Han
for a given language.
If a MMOG were to attempt to support, on the same server, a large
set of languages, written and spoken, it would have to overcome a
great deal of incompatibility between languages. Were the goal a
more modest portability - single release of software, but servers
for each language supported - it would simply be a case of having
operating system support tie-ins and (probably) internal unicode,
or at the very least multilingual encoding, support.
If it also had voice input support, or even worse, mixed mode and
unified presentation (recipient chooses to see text or hear voice
independant of what originator chose to use), something better in
the way of phoneme encoding would have to be developed, and there
would have to be a pretty advanced context-aware mapping resolver
figuring out what each phoneme-set was *supposed* to mean... just
as much, figuring out which _word_ a given ideographic symbol was
meant to be, not to mention unmangling typos and horrific typing,
u know, 2 b kewl?
I'd like to propose now, should anyone ever do this, that l33t be
translated to a spoken form that resembles an especially retarded
hillbilly just after a run-in with a large mallet. Or Elmer Fudd
just after getting shot with his own rifle.
--
Nathan F. Yospe - Programmer, Scientist, Artist, JOAT with a SAK
yospe#kanga.nu Home: nathanfyospe#mac.com Work: nyospe#a2i.com
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev
More information about the mud-dev-archive
mailing list