[MUD-Dev] Re: let's call it a spellcraft

Caliban Tiresias Darklock caliban at darklock.com
Wed Sep 23 23:22:39 CEST 1998


On 01:23 AM 9/24/98 +00-05, I personally witnessed Jon A. Lambert jumping
up to say:
>
>It begs the question, "Is the fixed dimensional array evil?"

Only when the dimension of the array is an integral part of the object's
type, as in Pascal -- where an array of three integers cannot be assigned
to an array of five integers, because an array of three is not an array of
five and therefore the types don't match so the assignment is impossible.

I'm building something at the moment, a simple database which stores and
retrieves data in compressed pages of 4096 bytes, which is sort of similar.
It's sort of like the old DOS FAT file system, in a way; at the top of the
file, there is page 0, which has 4096 bytes in it and is considered
"special" in that it bitmaps all the blocks that are used and unused. Since
it has 4096 bytes, it holds 32767 useful bits (the first bit is
self-referential and thus wasted). So the database, with that one index
page, can hold 128MB-4KB of data (32768 * 4096, - 4096 for the index) which
in turn is compressed with an LZ77 implementation and generally manages to
squeeze it down to about 40% of the original size (low entropy data
values). So it ends up meaning "you can store about 256 MB of data here,"
after you account for average cluster overhead and average object size.

Well, you may have more than 256MB of data. So something has to be done to
fix this. What I'm doing is adding a second database just like the first on
the end -- at page 4096, there is another index. Likewise at 8192, etc. So
looking up block X is a simple matter of looking up page X>>15, reading it
in, checking to see if block X is taken
(pagedata[(X>>3)&0xFFF]&(1<<(7-(X&3)))), and then seeking to location
(blocknum<<12) in the file to read in the 4096 bytes. The first four bytes
then point to the next block in the chain, until you get to the final block
which is tagged with a 0. Once you reassemble, you pass the conjoined data
stream into the decompression chamber, where it gets verified (MD5) and
decompressed. Then you have your data. Deletion is simple; just
pagedata[(X>>3)&0xFFF]&=~(1<<(7-(X&3))) for each block.

At its heart, this is still a fixed size array, but it's actually a
variable size array of fixed size arrays of variable sized arrays of fixed
size arrays. 

And yes, I'm aware that making the index page into a real FAT -- storing
the next data block info in the index, which allows chaining and
chain-length detection without seeking all over the file -- would certainly
increase performance. I'm considering it. But with the data at the
beginning of each block, a corrupt index is not necessarily the death of
the database; it can be regenerated dynamically to at least some degree.
Maybe doing both would be good.

-----------------------------------------------------------------------
Caliban Tiresias Darklock <caliban at darklock.com>   | "I'm not sorry or 
Darklock Communications <http://www.darklock.com/> |  ashamed of who I 
PGP Key AD21EE50 at <http://pgp5.ai.mit.edu/~bal/> |  really am."      
FREE KEVIN MITNICK! <http://www.kevinmitnick.com/> |  - Charles Manson 




More information about the mud-dev-archive mailing list