[MUD-Dev] Re: lurker emerges
T. Alexander Popiel
popiel at snugharbor.com
Sun Aug 9 18:53:44 CEST 1998
In message: <001e01bdc3ef$7f4573a0$961e5d18 at default.rochester.rr.com>
"James Wilson" <jwilson at rochester.rr.com> writes:
>From: T. Alexander Popiel <popiel at snugharbor.com>
>>
>>You seem to assume that it is impossible for a single thread to keep
>>even one I/O device continuously busy. Whatever happened to double-
>>buffering and non-blocking I/O? Has that art actually been lost in
>>the annals of time?
>
>I for one have only a vague idea of what this technique consists of.
>Could you maybe give some details, or a url describing it? Why would
>one need non-blocking i/o if select() tells you which sockets are
>ready for reads and writes? That is, assuming you bound the size of
>your output chunks appropriately, if the socket's ready for i/o shouldn't
>blocking be moot?
Addressing questions in reverse order: no, select() does not make
non-blocking moot, because while it tells you when some amount of
I/O is possible, it does not tell you how much I/O is possible.
It may be that you are now free to read or write only one byte,
and a request to read or write two bytes will cause your program
to block. There is no way to tell what the appropriate size for
your 'chunks' should be.
Double-buffering is a technique I heard about in conjunction with
old reel-to-reel tape drives, where there was great cost (both in
time and hardware lifespan) to having a pause between two read (or
write) requests, since the tape would have to stop, back up a bit,
and then spin back up to speed before it could service the delayed
second request. To avoid this delay, a typical read session would:
request a read(#1) of n blocks into buffer 1, request a read(#2) of
n blocks into buffer 2, wait for the read(#1) to complete, process
the data in buffer 1, request a read(#3) of n blocks into buffer 1,
wait for the read(#2) to complete, process the data in buffer 2,
request a read(#4) of n blocks into buffer 2, wait for the read(#3)
to complete, etc... In this way, there was always a read request
pending on the drive, so it never had to do one of those costly
stops. (Note that this technique only works if the processing time
for the data is smaller than the I/O time for the data... but it
was usually true then, and with CPUs these days, it's usually true
now, too.)
Unfortunately, double-buffering is not well supported by modern
OSes; it doesn't follow select() semantics at all. You need
notification when any read request is complete, not just when
unrequested data is available. POSIX generally doesn't like
this, the closest it comes is the background read/write from
tty signals. SysV and BSD define SIGPOLL and SIGIO, both of
which are again close, but not quite right. Everything else
seems to be device-dependant.
>>> Don't forget that non-threading has overhead too: if there is
>>> multiple file descriptors then select'ing and going through
>>> fd_sets can burns CPU time too.
>>
>>While this is true, this overhead is something that a creative
>>programmer can manage and reduce
>
>I am curious as to how one could speed up this process - can you
>avoid doing a search through the set of all active sockets to find
>those that are ready for i/o? Could you split up the fd_sets somehow
>to bound the number that would have to be checked? It seems like
>when one gets to thousands of simultaneous connections this
>could become a bottleneck.
There was a wonderful thread on this a short while ago, under the
subject line "xxx" (I'm looking this up, will get back to you, the
kanga.nu web server (and hence the archives) seem to be down, the
thread was sometime around 14 May 1998). Solutions presented were
primarily different flavors of fd_set partitioning, either among
multiple threads or by activity level within one thread. Being a
miserly sort, I prefer the latter. ;-) It was pointed out that there
is a small unavoidable cost to using select() on high fd numbers,
even if you're only checking one, because select must scan all the
fd numbers from 0 to whatever you set for a max, to find the fds
that you're interested in. This unavoidable cost is fairly small
in comparison to the cost of actually checking a large number of fds...
- Alex
More information about the mud-dev-archive
mailing list