[MUD-Dev] Language Independence Article.

Tue Nov 30 23:02:02 CET 1999

On November 3, 1999, Ben Greear wrote:
>Just to toot my own horn a bit.  I wrote an article for imaginary.com on
>my implementation of Language Independence in ScryMUD.  The techniques
>and code examples should pertain to just about anyone though.
>
>You may find it here:
>http://imaginaryrealities.imaginary.com/volume2/issue10/independance.html
>
>I am interested in comments any of you might have...

1)

Running at least the title and URL through a spell-checker
would produce more professional-looking results. :)

2)

A consequent of "Anything worth doing is worth doing well."
is of course "Anything not worth doing well is not worth
doing at all."

I decided not to spend coding time and effort on this for Muq
basically for the reasons you touch on: Even if one makes a lookup
table of all user-visible strings in the server, and correctly handles
such issues as some languages needing 16-bit encoding (or UTF-8 or
such) and some languages reading left-to-right and others
right-to-left (which has implications for correctly embedding strings
within each other), there is still the intractable issue of a
continually changing db maintained by many amateur hands, most of them
likely monolingual Americans, given the realities of today's Internet.

I was further discouraged when the few non-native Anglophones I asked
about the subject mostly seemed quite unexcited about it: "Anyone
using the Internet is going to have learned English," seemed to be the
feeling.  (I'd expected something more along the lines of "Finally --
About time you arrogant monolingual yanks got with the program!!")

A basic rule of problem-solving is:

    Solve the hardest subproblem first.

(Reasons include the fact that you have the most design
freedom early on, and you want all the design freedom
you can get for the hardest subproblem.)

So if one is going to tackle this problem at all, I think
taking on the db issue comes first from a design point
of view.  Any viable solution to the multilingual db
problem is likely to take the easier server problem in
stride.

In particular, I think it can be taken as a given that the "many
amateur hands" mentioned above are -not- going to be setting up and
maintaining big tables of enum-declared strings.

So any viable solution is going to have to extract strings from
the db automatically -- which is not all that difficult -- and
then apply translations on the fly with basically zero direct
reliable support from the "many amateur hands".

Which suggests to me that -if- one is going to make a serious
swing at multilingual support, the hack is going to have to
look pretty much like:

* DB scanner of some sort extracts ascii strings.  This could
  be a periodic batch-mode process or a purely opportunistic
  on-the-fly thing.

* A db of all such known strings is automatically maintained,
  updated as new ones are noticed or old ones recycled.

* Hashtables are kept mapping strings from the canonical db
  language (likely English in practice, but maybe not if
  hosted in France or China or such) to other supported languages.

* Presumably (but not necessarily) bilingual volunteer(s) maintain
  each such hashtable, automatically notified when new unknown
  strings show up.  To reduce the load on them, likely there should
  be a hack so a string has to stay stably in the db for 2-5 days
  before they are asked to translate, to avoid inundating them
  during debugging.

* System output uses the above hashtables to look up each output
  string encountered.

This approach basically replaces the hand-assigned enum of
the your scheme with an automatically-assigned hashcode, and
your array lookup with a hashtable lookup:  I think the resulting
handwork savings would be a major issue in practice, and the
extra compute time and space used an entirely trivial issue
in practice.

If you opt for purely on-the-fly extraction of strings in
this approach, you'll capture server-generated strings along
with in-db ones without any extra effort whatever.  Otherwise,
you'll likely have to add some sort of simple scan of the
server proper to that of the db:  'strings myserver' might
suffice, or you might write a moderately elaborate source
code parser, or whatever -- a null macro wrapped around
each string needing translation would be a minimal load
on server hackers and make automatic extraction of them
all from the source code a breeze.

Anyhow -- I'm delighted to see someone looking at this!  I think
one of the great things about the Internet in general and mu*
in particular is the potential (largely unrealized, alas) for
cross-cultural communication.  Bravo to anyone promoting this!

 Cynbe

_______________________________________________
MUD-Dev maillist  -  MUD-Dev at kanga.nu
http://www.kanga.nu/lists/listinfo/mud-dev