[DGD] DGD Digest, Vol 106, Issue 2

Raymond Jennings shentino at gmail.com
Sat Oct 17 05:02:55 CEST 2015


Hey list, just had an interesting conversation about RST packets tripping
up insta-close connections.

Figured it would be useful for LPC dev, so here it is.

On Fri, Oct 16, 2015 at 7:07 PM, Andrew Skalski <askalski at gmail.com> wrote:

> Sure, feel free to forward to the list.  (I'm not subscribed from this
> address, otherwise I would have replied there directly.)
>
> And yes, the RST is because the client sent data before the server could
> close the connection.
>
> If those other immediate closures you mention are for MUD connections,
> those are less of a concern because they are stream oriented, as opposed to
> HTTP which is message-oriented.  Even if a RST happens, the user will see
> everything that was sent up until that point.  The only concern in that
> case is out-of-order delivery.  Also, not all MUD clients will try sending
> data soon enough to trigger the issue (the exception would be clients that
> immediately attempt to negotiate telnet options.)
>
> Andy Skalski
>
> On Fri, Oct 16, 2015 at 9:13 PM, Raymond Jennings <shentino at gmail.com>
> wrote:
>
>> I also use immediate closures to deal with people trying to connect when
>> the server is under maintenance (system suspended) or if the user table is
>> too full (I save two slots in the table at all times, firstly to save a
>> port for the admin login, and secondly to purge and burn off any
>> connections piling up in the queue so that they don't hang)
>>
>> These along with the enforced sitebans are hard coded into system's userd
>> and the connection managers asking to handle incoming connections are not
>> even given a chance to interfere.
>>
>> So...basically, the client is getting an RST because the client's data
>> got "bounced" after the server closed the connection?
>>
>> On Fri, Oct 16, 2015 at 6:06 PM, Raymond Jennings <shentino at gmail.com>
>> wrote:
>>
>>> may I forward this conversation to the DGD list?  It sounds like
>>> important information.
>>>
>>> Also...EWW.
>>>
>>> On Fri, Oct 16, 2015 at 5:38 PM, Andrew Skalski <askalski at gmail.com>
>>> wrote:
>>>
>>>> I did some further testing, and was able to rule out NAT as a
>>>> contributing factor.
>>>>
>>>> It all boils down to the order which packets are received and processed
>>>> by both peer.  Because the server never reads the HTTP request, it will
>>>> send a RST, either (a) when close() is called, if the request was received
>>>> already; or (b) immediately, if the request arrives after close().  Note
>>>> the RST is sent in both cases.
>>>>
>>>> It behaves intermittently because the following *all* must be true for
>>>> it to succeed:
>>>>
>>>>    1. The server responds and closes the socket *before* the kernel
>>>>    receives the HTTP request packet.
>>>>    2. All response packets, including the FIN must arrive *before* the
>>>>    client receives the RST.
>>>>
>>>> Ways to mitigate this:
>>>>
>>>>    1. Try to read the HTTP request.  Reading the entire request means
>>>>    no RST will be sent.  Even with a very short timeout (100 milliseconds),
>>>>    this should solve most of your problems.  The request typically arrives
>>>>    immediately (same round-trip interval) after handshake.
>>>>    2. Shutdown the "write" half of the socket (SHUT_WR) before closing
>>>>    it.  This will ensure a FIN is always sent.  (I don't think DGD exposes
>>>>    this system call, unfortunately.)
>>>>    3. Send a Content-Length header, so the client does not need to
>>>>    receive a FIN.  This may still be susceptible to out-of-order delivery,
>>>>    because the RST may arrive before all response packets have been received.
>>>>    Delaying briefly before closing the socket can help minimize reordering
>>>>    (this might be mitigation #1 cleverly disguised as a "delay"...)
>>>>    4. Put DGD behind a HTTP proxy such as nginx and let it handle
>>>>    shielding you from abusive requests.
>>>>
>>>> Hope this helps,
>>>>
>>>> Andy Skalski
>>>>
>>>> On Fri, Oct 16, 2015 at 5:04 PM, Andrew Skalski <askalski at gmail.com>
>>>> wrote:
>>>>
>>>>> That shouldn't be necessary; I was just able to reproduce this a few
>>>>> minutes ago with a Perl script acting as webserver, running behind an
>>>>> OpenWRT router doing NAT.
>>>>>
>>>>> Adding a Content-Length header seems to help significantly.  Apache
>>>>> Bench still complains, but web browsers (and 'curl') so far haven't given
>>>>> me any errors.
>>>>>
>>>>>   http://mighty.voltara.org:50080/   (original)
>>>>>   http://mighty.voltara.org:50081/   (with Content-Length)
>>>>>
>>>>> Andy Skalski
>>>>>
>>>>> On Fri, Oct 16, 2015 at 4:46 PM, Raymond Jennings <shentino at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Would you like to see the LPC source code involved?
>>>>>>
>>>>>> On Fri, Oct 16, 2015 at 1:43 PM, Andrew Skalski <askalski at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Oops, didn't mean to copy that to the entire list.  (I'm subscribed
>>>>>>> from a different address.)
>>>>>>>
>>>>>>> On Fri, Oct 16, 2015 at 4:40 PM, Andrew Skalski <askalski at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm having trouble reproducing this in Chrome, except for one
>>>>>>>> thing: sometimes the "favicon.ico" request fails with a
>>>>>>>> net::ERR_CONNECTION_RESET error code.
>>>>>>>>
>>>>>>>> I had better luck using Apache Bench to generate concurrent
>>>>>>>> requests:
>>>>>>>>
>>>>>>>> $ ab -l -n4 -c2 http://shentino.mynetgear.com:50080/
>>>>>>>> This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
>>>>>>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd,
>>>>>>>> http://www.zeustech.net/
>>>>>>>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>>>>>>>
>>>>>>>> Benchmarking shentino.mynetgear.com (be
>>>>>>>> patient)...apr_socket_recv: Connection reset by peer (104)
>>>>>>>> Total of 1 requests completed
>>>>>>>>
>>>>>>>> Looking at wireshark output, the issue *appears* to be happening
>>>>>>>> at the TCP layer, and depends on the timing of the request and response.
>>>>>>>> This is mostly guesswork, because I don't have a server-side packet capture
>>>>>>>> to compare it to:
>>>>>>>>
>>>>>>>>    - Browser sends "SYN"
>>>>>>>>    - Server receives "SYN", sends "SYN+ACK"
>>>>>>>>    - Browser receives "SYN+ACK", sends "ACK" (thus completing
>>>>>>>>    3-way handshake) immediately followed by HTTP request.
>>>>>>>>    - Server receives "ACK".
>>>>>>>>
>>>>>>>> At this point, one of two things happen:
>>>>>>>>
>>>>>>>>    - Server receives HTTP request.
>>>>>>>>    - Server sends HTTP response and "FIN" to close the connection.
>>>>>>>>    - Browser receives HTTP response and "FIN".
>>>>>>>>    - Transaction is succesful.
>>>>>>>>
>>>>>>>> Or...
>>>>>>>>
>>>>>>>>    - Server sends HTTP response and "FIN" to close the connnection.
>>>>>>>>    - Server receives HTTP request.  Because the connection no
>>>>>>>>    longer exists, it responds with a "RST".
>>>>>>>>    - Browser receives "RST" before either the HTTP response or the
>>>>>>>>    FIN.
>>>>>>>>    - Failure: Browser throws a "connection reset by peer" error.
>>>>>>>>
>>>>>>>>
>>>>>>>> This is generally what I suspect is happening.  If you have any
>>>>>>>> firewall or NAT devices in front of the server, they may also be playing a
>>>>>>>> role.  For example, upon RST from the server, it may immediately remove the
>>>>>>>> connection from its NAT table, and discard any frames it might have queued
>>>>>>>> for transmit.
>>>>>>>>
>>>>>>>> One thing that might help (but not necessarily solve the issue) is
>>>>>>>> to send a Content-Length header.  Without Content-Length, the browser must
>>>>>>>> wait for the TCP connection to close (FIN) before it considers the response
>>>>>>>> to be complete and successful.  Abnormal termination (RST) prevents this
>>>>>>>> from happening.  However, if the server sent a Content-Length header, the
>>>>>>>> browser can consider the response complete as soon as it receives the
>>>>>>>> specified number of bytes.
>>>>>>>>
>>>>>>>> By the way, another thing I noticed is the HTTP response headers
>>>>>>>> are line-terminated improperly.  Header lines (including the final, empty
>>>>>>>> line) are required to be CRLF terminated.  (This rule does not apply to the
>>>>>>>> response body.)  I doubt this has anything to do with the issue you're
>>>>>>>> having, however.
>>>>>>>>
>>>>>>>> Andy Skalski
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 16, 2015 at 2:50 PM, Raymond Jennings <
>>>>>>>> shentino at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The site is shentino.mynetgear.com, http is port 50080
>>>>>>>>>
>>>>>>>>> My suspicious are a race condition of sorts that causes things to
>>>>>>>>> choke.  Hammering it with telnet repeatedly reliably and prompty gets back
>>>>>>>>> the error page.
>>>>>>>>>
>>>>>>>>> If you can find out anything by testing I would be quite intrigued.
>>>>>>>>>
>>>>>>>>> On Fri, Oct 16, 2015 at 7:47 AM, Andrew Skalski <
>>>>>>>>> askalski at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I'd be interested to see this in action.  My IP is 68.133.31.149,
>>>>>>>>>> so
>>>>>>>>>> if you siteban that and give me a URL to hit, I can see if I can
>>>>>>>>>> reproduce it either in Chromium or Chrome (Linux), then use
>>>>>>>>>> tcpdump to
>>>>>>>>>> see what's going wrong.
>>>>>>>>>>
>>>>>>>>>> (Assuming you haven't figured it out already.)
>>>>>>>>>>
>>>>>>>>>> Andy Skalski
>>>>>>>>>>
>>>>>>>>>> >
>>>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>>> >
>>>>>>>>>> > Message: 1
>>>>>>>>>> > Date: Thu, 15 Oct 2015 19:52:43 -0700
>>>>>>>>>> > From: Raymond Jennings <shentino at gmail.com>
>>>>>>>>>> > To: "All about Dworkin's Game Driver" <dgd at dworkin.nl>
>>>>>>>>>> > Subject: [DGD] instant replies to HTTP requests confusing
>>>>>>>>>> browser?
>>>>>>>>>> > Message-ID:
>>>>>>>>>> > <
>>>>>>>>>> CAGDaZ_rOUJrg3f+_MG9qao2+rHegxAiq-OLABwxLFv4BVPu4WQ at mail.gmail.com
>>>>>>>>>> >
>>>>>>>>>> > Content-Type: text/plain; charset=UTF-8
>>>>>>>>>> >
>>>>>>>>>> > Taking a page out of skotos's book, I added an Http module to
>>>>>>>>>> kotaka.
>>>>>>>>>> >
>>>>>>>>>> > I'm having some problems though after having my system userd
>>>>>>>>>> check a
>>>>>>>>>> > connection's ip against the siteban manager before passing the
>>>>>>>>>> connection
>>>>>>>>>> > to the http userd.
>>>>>>>>>> >
>>>>>>>>>> > If the IP is sitebanned, the system userd merely asks the http
>>>>>>>>>> userd for a
>>>>>>>>>> > siteban message and then returns that before closing the
>>>>>>>>>> connection
>>>>>>>>>> > immediately without even waiting for a request.
>>>>>>>>>> >
>>>>>>>>>> > Sometimes the browser shows the error page, and sometimes shows
>>>>>>>>>> nothing at
>>>>>>>>>> > all.
>>>>>>>>>> >
>>>>>>>>>> > Is there any reason this is going on?  Is it buggy behavior in
>>>>>>>>>> chrome or am
>>>>>>>>>> > I violating HTTP/1.1 somehow?
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > ------------------------------
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the DGD mailing list