[MUD-Dev] [TECH] String Classes, Memory Management, and Fragmentation

Bruce Mitchener bruce at puremagic.com
Thu Jul 12 23:46:18 CEST 2001


Daniel.Harman at barclayscapital.com wrote:

> They've essentially gone down the same road as Java for strings,
> and the over-riding reason for this appears to be simplicity of
> ensuring thread-safety. Of course having immutable strings can
> cause a lot of performance headaches. Especially when you have a
> garbage collector based memory management system. You end up with
> an awful lot of discarded objects if you want to change the
> strings much - only last week, I had to spend several hours
> optimising a java program so that the GC didn't get overwhelmed
> with discarded objects. I had to do a number of cheesy fixes such
> as disable the logging messages, remove my use of iterators and
> replace them with for loops etc. etc. The problem tends to only
> manifest in tight loops, but seeing as I was writing a message
> bridging system, it was a serious problem.
 
> In java at least you shouldn't reuse the Stringbuffer objects
> either. If I recall correctly, Java Stringbuffers grow, but they
> don't shrink :) This is a real problem if you are using a string
> buffer that has grown a lot, as when you copy it to a string, it
> copies the whole buffer, not just the bit you think is
> populates. You then end up with some very large padded strings.
> So you end up having to create lots of string buffer objects,
> which takes you back to the first problem I mentioned! (That last
> bit about copying the whole buffer not just the data you think is
> in it, I am 98% sure is accurate, but I don't have time to check
> for now, and its how I remember it).

> Anyway, I'm not a big fan of the GC paradigm for performance based
> apps for reasons such as this :) Same with immutable strings, sure
> you don't have to understand object syncronisation properly, but
> the overhead is ugly.

In reading this, my reaction was far different. :)

The first 2 paragraphs, rather than seeming like reasons to avoid
GC, are instead powerful reminders that an application can not and
should not be implemented without taking into consideration the
context within which it will be implemented and run.  That context
includes the language, the runtime libraries, as well as the usual
design considerations.  It also includes bugs or problems with the
implementation of any of the supporting code.

A similar 'argument' could be made against many things.  Reflection
in Java has a couple of bugaboos in the API.  Refcounting in C++ can
often be error prone.  Yet we don't avoid those usually or make
sweeping statements about them the idea of reflection or
refcounting.

In fact, using GC, especially a copying collector which will do heap
compaction can be a decent performance win.  There are also other
techniques which can be applied to lower the cost of transient
objects (or to reduce the need for them).  Some thoughts on this
were brought up a couple of years ago in:

  http://www.kanga.nu/archives/MUD-Dev-L/1999Q4/msg00564.php

It's also been pointed out to me that OCaml has near-or-better
performance when compared with C and it uses a GC.

  - Bruce

_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev



More information about the mud-dev-archive mailing list