[MUD-Dev] Re: Question on c++ switch optimization, and parsers in general.

Mon Feb 8 19:35:47 CET 1999

On Tue, 9 Feb 1999, Richard Woolcock wrote:

> Ben Greear wrote:
> > 
> > Well, after the 400th person said my parse engine was an insult to
> > hackers everywhere (I was the first of those 400!!), I am re-writing it.
> > 
> > Basically, I'll have a bunch of classes hashed into an array
> > that will contain the keywords mapped to an enum.
> > 
> > Now, I get the enum, and then I need to call the various commands
> > that the enum maps to.
> > 
> > Currently, I'm using a huge (~400 cases) switch statement.
> > 
> > So, the question is:  Is that efficient?  Does the compiler
> > generate code that does better than a linear search down the
> > case statements?  If not, I can manually hack a sort of n-ary
> > tree performance into it, but I'd wrather not if I can help it.

	I decided to do some tests on this as I have been curious about it
and now seemed to be a good time.
	I had 2 different compilers (MSVC 5.0 and gcc on a Sparc) produce
assembly output (very good things to do with these sorts of things).
	With gcc I compiled a .cpp and a .c file.  It uses the extension
as the file type.

> I don't know about C++, but I can speak for C.

	In the switch regard both compilers acted the same.

> I believe that switch statements are compiled down to the equivilent
> of if statements, ie:
> 
>    switch ( x )
>    {
>       case 1: ...
>       case 2: ...
>       case 3: ...
>       default: ...
>    }
> 
> Is no more efficient than:
> 
>    if      ( x == 1 ) ...
>    else if ( x == 2 ) ...
>    else if ( x == 3 ) ...
>    else ...

> So basically, no, it's no better than a linear search.  Not that it 
> really matters, unless you have thousands of commands.  I prefer the
> diku-style array of commands, but you'd have to go through that in
> a linear fashion too (you could do a binary search through it, but
> that might result in annoying things like typing 's' resulting in
> 'sit' rather than 'south').  Still, it's nicer to look at.
> 
> Some more modern compilers may treat switch statements better than
> the ones I'm used to - if so, I'd appreciate it if someone told me :)

	My test code looked roughly like:
	int i;
	int d;

	i=rand()&255;

	switch(i)
	{
		case 0:
			d=rand();
		break;
		case 1:
			d=rand();
		break;

	. . .

	They do things differently.  It was very interesting actually.
With only two cases did the case statements generate code that looked like
if/thens.  With 3 it generated this:
  sub eax, 0
  je  SHORT $L491
  dec eax
  je  SHORT $L491
  dec eax
  jne SHORT $L488

	Which I thought was cute.  With more it generates a jump table.
This code looks roughly like:
	cmp	eax, 35					; 00000023H
	ja	SHORT $L488
	jmp	DWORD PTR $L535[eax*4]

  Where $L535 is the jump table location.  I had the same code for each
case statement.  In the debug build the code was repeated, but in the
release build it recognized these basic blocks as identicle and used the
same code.  
	I also did wildly varying values and sequential values as well as
skips every now and then.  Skips were easy, the compiler just put the end
of the switch statement as the place to jump to.  
	So it appears that a switch with sequential values is roughly
equivelent to a indirect jump.  Not too bad actually.
	As a side note.  The compiler did not pickup on the fact that i
could never be greater than 255.  

Marc Hernandez