SfD: Clientside Caching

Nathan F Yospe yospe at hawaii.edu
Mon Mar 16 15:01:15 CET 1998


SfD: Subject for Discussion. I decided to stir things up. It has been slow.
I'm going to start posting these every few weeks, or as the impulse strikes
me... just to get the juices pumping. Feel free to tear it to pieces.

Today's topic: Storage of data for a mud on a permanent or temporary basis.
For now, I will assume a system that only works with a custom client, where
'client' means either a remote pure client, or a 'smart' client, like those
employed (presumably) by UOL. (Is M59 a pure client? I'm pretty sure Pueblo
is.)

A little background: 

   Some months ago, I discovered that my prototype concept mud was far more
than my hardware could handle. It was built around three distinct processes
that I could easilly seperate: a detailed physical simulation, a parser for
input commands, and a dynamic text generator. Of the three, the first would
have to be handled by the server (or servers, as I have been designing with
distributed processing in mind), but the two user interface concerns were a
good candidate for a client. After a bit of consideration, I began thinking
out the client's design. At this point in time, I have completed a skeletal
server, and the most rudimentary server interface of a client. I won't have
much opportunity to complete the client in the near future, but I do intend
to finish it all eventually. I currently work on it for three hours a week.

   There were several issues that I considered when deciding to create this
client. The first was portability. In the end, I decided to make the client
in two forms; the first a Java GUI application with optional native methods
for select platforms, the second a Java text application with output to the
main terminal, again with optional native methods, to be hosted by the user
on some unix account, here effectively giving them a way to play the mud as
if it were a regular telnet server, with a telnet client. It may be a hack,
but it has a few plusses. Much of the work is done by the player's machine,
be it their home PC or the machine they have an account on. The player also
supplies the memory space for the larger portion of their character and ego
preferences and memories. The second major issue was capability; I couldn't
do what I wanted to do without a client on current resources, and I wanted,
as a bonus, to be able to provide a GUI and background pictures for clients
not running as telnet. I did have one problem here, in running the same set
of preferences on multiple machines, for someone who uses several computers
each day. I will get into the possible solutions to that when I finish this
introduction. The third concern was flexibility, and I admit to having much
more difficulty on this issue than the others. This leads to another reason
to cache: downloadable updates to the client. The alternative was downloads
of the entire client every time I changed something. The fourth issue is an
enormous one, and I may have flubbed it: possibility. I may have gotten too
wide eyed on this one, but I'm going for it. Readers of rec.games.mud.admin
may recognize these criteria... I posted an extensive discussion on it, but
there were no replies. *sigh* I'll repost it here, maybe. We need matter to
get things going again. (If anyone else wants to start posting SfDs, please
do. I don't have any kind of trademark on the name, and random thoughts are
good fodder for productive discussion, especially with the brain-pool here.

   Essentially, the question of client caching comes down to 1) What are we
going to cache, 2) How long are we going to cache it for, 3) where and how,
and 4) why do we need to cache it? This brings us to the technical portions 
of this discourse, and I will now try to break my inane habit of right hand
justification. It's become an addiction, you see.

1) What are we going to cache?

   There are several potential candidates for caching. Of course, there are
the obvious ones; graphics ought to be cached in memory, then on disk if 
a disk is available (this is not possible with applets), sounds ought to at
the very least be cached on disk; anything else of similar size should, at
all costs, be kept as far from repeated downloading as possible. Text,
while less bandwidth intensive, might also be a candidate for caching. On
the other hand, anything cached is no longer subject to modification by the
server unless update flags exist. A client could potentially cache even the
behavior of objects, if said behavior was controlled by some non binary
algorithms; but in this case, there would have to be some sort of regular
conformation checking with the server. A good basis for the answer to the
above question, then, is, 'how much can we safely offload onto the client?'
   The client can obviously be trusted to store anything that would be sent
once, as there is no longer any security concerns with that information. If
there is a concern about the player hacking the client to provide more than
the information a player ought to have, the data storable drops dramaticly.
However, some discretion in how the data is stored - non sequential storage
relative to physical location; storing text in a database with tokenized
references to word sequences, perhaps even a dictionary/thesaurus approach
with some form of abstracted linguistic association (guess what I've been
up to yet?)... or in the case of graphics, storage in a pallette based 
table, which saves you in space, and results in a scrambling of data that
can't be easily sorted before the subjects of the graphics (assuming some
form of flate sprite graphics) or a similarly stored polygon library... the
only other option is to be extremely paranoid about what you allow a client
to see. Remember, a hacker can store anything you send; this is not an
issue of the client. If you don't want it stored, don't send it. Period. If
you want it stored, but not readable without great effort, my advise is to
send it in big chunks of database material and extract it by some method
that is integral to the client and well disguised, or, alternatively, under
the direction of the server.
   This covers well the storage of things from the server, but there is 
another potential here that I think is far more valuable: storage of those
things inherent to the client's "memory"... be it name recognition, scripts
and preferences, or a complete "personality" to be applied to text parsing
or generation, having this on the client's end opens up a whole new level
of potential. This is currently embodied by arcade fighting and racing 
games. Have you ever seen one of these games get tougher and "smarter"?
   To recap: We can cache three basic types of data; transient data from
the server, database contents from the server, and 'memory' data from the
player. Which ones we cache, and how much of each, depends on the specific
case in question.

2. How Long are we going to Cache it for?

   We've got a chunk of data; we want to be able to reference it without
downloading it again; but we don't want to store it forever - unless we are
doing the database thing, in which case we may go so far as releasing most
of it on a CD and just storing updates (ref. UOL, M59?), or at the very
least, notifying clients that the database may reach a size as large as X,
where X is some number larger than what (you hope) is the maximum size in
the reasonably foreseeable future. I specify 1, 2, 4, 10, and 20 MB caches
at the moment; each has a slightly different characteristic; text only mode
may go as large as 15 MB, and the graphical backdrop mode also features a
50 MB cache option in spite of my relatively well compressed backdrops; it
should handle anything I ever produce, but I haven't figured out how I'm
going to ever make that mondo chunk available for download.)
   So, we've got this data, we need to weed out the junk and keep the good
stuff; otherwise, how are we going to store it all? Well, first of all, can
we compress it at all? And what about duplicate information? Is there any?
OK, we've gotten rid of all the duplication, compressed... we still have
too much cached data. What now? Now, we decide what we make our criteria 
for dumping stuff. Age? Infrequent use? No recent use? Size? This one goes
both ways; the stuff that saves you the most space costs the most to get
again. Generally, small infrequently (or unrecently) used stuff is the best
candidate. Small stuff adds up, and you have more time to get it back, if
you download as needed. Or rather, you require less time at the point where
the need occurs. On the other hand, if you can download stuff in the
background, the big stuff comes down faster, megabyte for megabyte. This is
generally associated with large scale storage models, however.
   What if we've got stuff in memory cache, and want to clear that out? Is
it better to save it to disk than to redownload it? Obviously, over a modem
it would be, but sometimes it is faster on the client side to get it over a
T1 connection than to load it from disk. This is one of the points where I
tend to say, "Server first!". Even if a specific case happens to have the
bandwidth to grab stuff fast, and slow disk access, a few dozen of these
will kill the server. The server should always be primary concern, the user
should be asked to bear as much of the burden as possible. 

3) Where and How?

   This is not just a question of memory, disk, or register. This is also
one of access... if we are caching for the convenience of the client and 
server on downloads, we want it right there, but say there is a memory of
some sort associated with the client. Does a player who uses many machines
have to carry their memories on a disk? Why not allow a player to set some
remote accessable point, in the manner of a remote .newsrc, that can be 
pointed to by multiple machines, with a breakpoint on time, or a SCCSlike
diff function? This is a different sort of caching, with different motives
and methods, but it has some serious implications as well.

4) why do we need to cache it?

   Always make the client do the lion's share of the work. Others may
disagree, but this philosophy has served me well... just think of it this
way: a hundred users, with a few of them using straining 286s and Mac IIs,
and a server (or distributed network of servers, but I'm figuring upward of
100 clients per server) pushing itself to handle the simulation end; VS a
much less capable server, and a bunch of dumb clients totally failing to
burn any of the users' processing capabilities. Now, I will admit that a
mud as we know it now barely taxes a modern Pentium X or PPC chip, but...
just think how much more that mud could do if the text parsing and so forth
were handled by the clients, leaving the server to do the simulation. This
is the real promise of clientside caching. My client even handles the line-
of-sight calculations, which may be a bad idea - how hard is that going to
be to hack, and how long until people release the first "around the corner"
cheat? - but saves yet another major calculation for me. I always hated to
do the ray tracing; spherical resolution collision detection is bad enough.
   Generally, the reasons covered above... memory for the client, reduced
download time, and reduced work for the data server (I have a data and an
event server. This may well make up the core of a future TfD.), with some
additional emphasis on tricks like dynamic text generation, more flexible
namespace storage, and so forth, cover the majority of cases. I'm sure you
all have others. Let's here some feedback!
--

Nathan F. Yospe - Aimed High, Crashed Hard, In the Hanger, Back Flying Soon
Jr Software Engineer, Textron Systems Division (On loan to Rocketdyne Tech)
(Temporarily on Hold) Student, University of Hawaii at Manoa, Physics Dept.
yospe#hawaii.edu nyospe#premier.mhpcc.af.mil http://www2.hawaii.edu/~yospe/
















More information about the mud-dev-archive mailing list