Subject: Re: New release?

Re: New release?

From: John Engelhart <john.engelhart_at_gmail.com>
Date: Fri, 30 Oct 2009 06:43:07 -0400

On Thu, Oct 29, 2009 at 4:32 AM, Daniel Stenberg <daniel_at_haxx.se> wrote:

> On Thu, 29 Oct 2009, John Engelhart wrote:
>
> And, hypothetically speaking, if someone were to re-work the code, could
>> you provide feedback on whether or not the new code "works"?
>>
>
> Of course! I certainly wouldn't mind seeing a cleanup and improved time-out
> code, and of course I'd review and test patches.

I poked at it a bit. Looks like a skip-list would be the simplest way to
get the job done. There'd also have to be an "intereator" API /
functionality. I suspect there wouldn't be much use for the generic
"search" like functionality (key / value lookups), more so the fact that the
skip list keeps things sorted.

>
> Personally, my $0.02, regardless of any of the above or possible bugs, I
>> think it'd be a smart move to tweak the logic in ares_timeout() to return a
>> "minimum" timeval on at least the order of 1000 micoseconds / 1 millisecond,
>> even if the loop scan returned a value less than this (and in particular the
>> current 0+0). This minimum timeval would obviously be compared against any
>> "max" timeval passed in, the lesser of the two would be the final result. It
>> just seems like a good idea to not hand back a timeval of 0+0 unless you
>> have a very, very good reason, particularly for time outs on the "main" work
>> loop path.
>>
>
> I'm still not completely convinced about us needing to do that. Let's say
> you have a case where you throw away MANY queries almost simultaneously, and
> then after a while N of them timeout. With N being 10, 100 or 1000. If we
> deliverately delay the timeout to 1ms, it could take up to a second to go
> over and do the timeout actions.
>

How's that? Wouldn't the timeout processing logic process all packets that
have "timed out"? It's really only a limit on how often timeout processing
is done, it's not really a limit on how much work can be done during any
given check.

If I understand your concern (and I'm not convinced I do, mind you), you
seem to be assuming that only a single packet can be processed per "tick of
the clock", and with a 1ms tick, it would take 1 second to process 1000
packets. That's definitely not what I'm suggesting. I'm suggesting an
upper bound on the maximum number of times the "queue / packet / work" loop
gets run to 1000 / second (completely arbitrary number). There's no limit
to the amount of work any given run can do.

This is often more efficient and causes less cache thrashing because it
allows work to be "queued up", and then bulk processed in one shot at the
next 'tick of the clock'. If you don't "queue things up" like this, then
the work loop for something like c-ares is likely to be waiting in select()
with a very, very small timeout (possibly 0), having the timeout expire,
dropping out of select(), going through the whole ares_process() loop and
having process_timeout() deal with just a single packet, then go back in to
select() with another very small timeout for the next packet. With a
minimum time out, the chances of doing some additional useful work go up
dramatically- perhaps select() returned because the kernel has a socket with
ready data. In that case, you get some useful work done along with dealing
with the timed out packets "as long as you're there."

> Your 0+0 case is problematic only if it *repeatedly* returns 0 so that the
> program goes into a busy-loop and that is the problem: the repeated returns
> of "run now", not the "run now" when it happens only once. And repeated
> (unfounded) such returns is a sign of a bug, not an expected behavior.

I agree about the bug part. Here's some debugging logs of when I was
experiencing the problem:

Oct 26 03:46:08: TRDNS bgThread: loopCount: 35, selectCount: 34,
didNotSelect: 0, processedCount: 34 queryCount: 8,
selectTVWasZero: 0
Oct 26 03:46:10: TRDNS bgThread: loopCount: 9, selectCount: 9,
didNotSelect: 0, processedCount: 9 queryCount: 5,
selectTVWasZero: 0
Oct 26 03:46:12: TRDNS bgThread: loopCount: 377779, selectCount: 377779,
didNotSelect: 377774, processedCount: 5 queryCount: 6,
selectTVWasZero: 377773
Oct 26 03:46:14: TRDNS bgThread: loopCount: 472828, selectCount: 472828,
didNotSelect: 472823, processedCount: 5 queryCount: 13,
selectTVWasZero: 472824
Oct 26 03:46:16: TRDNS bgThread: loopCount: 322217, selectCount: 322217,
didNotSelect: 322213, processedCount: 4 queryCount: 8,
selectTVWasZero: 322212
Oct 26 03:46:19: TRDNS bgThread: loopCount: 149072, selectCount: 149072,
didNotSelect: 149059, processedCount: 13 queryCount: 15,
selectTVWasZero: 149060
Oct 26 03:46:21: TRDNS bgThread: loopCount: 44313, selectCount: 44313,
didNotSelect: 44304, processedCount: 9 queryCount: 12,
selectTVWasZero: 44303
Oct 26 03:46:23: TRDNS bgThread: loopCount: 465376, selectCount: 465376,
didNotSelect: 465371, processedCount: 5 queryCount: 13,
selectTVWasZero: 465371
Oct 26 03:46:25: TRDNS bgThread: loopCount: 297621, selectCount: 297621,
didNotSelect: 297610, processedCount: 11 queryCount: 15,
selectTVWasZero: 297610
Oct 26 03:46:27: TRDNS bgThread: loopCount: 169703, selectCount: 169703,
didNotSelect: 169697, processedCount: 6 queryCount: 16,
selectTVWasZero: 169697
Oct 26 03:46:29: TRDNS bgThread: loopCount: 203571, selectCount: 203571,
didNotSelect: 203554, processedCount: 17 queryCount: 19,
selectTVWasZero: 203554
Oct 26 03:46:32: TRDNS bgThread: loopCount: 95835, selectCount: 95835,
didNotSelect: 95822, processedCount: 13 queryCount: 22,
selectTVWasZero: 95823
Oct 26 03:46:34: TRDNS bgThread: loopCount: 338988, selectCount: 338988,
didNotSelect: 338974, processedCount: 14 queryCount: 21,
selectTVWasZero: 338974
Oct 26 03:46:36: TRDNS bgThread: loopCount: 308893, selectCount: 308893,
didNotSelect: 308887, processedCount: 6 queryCount: 22,
selectTVWasZero: 308888

Stats were printed, and then cleared, once every two seconds. Total number
of DNS queries was probably on the order of 3-6 per second, with a number of
them taking an unusually long time to get a reply back from the DNS server
(there was some kind of problem at the time). The first four seconds
represent "correct, expected" behavior. Those queries were ones that were
"working correctly" and returning results virtually instantly (from
a caching dns server that sat on the same network broadcast domain as the
host from which the query was being made). After ~4 seconds, the
problematic, not-answering queries started.

loopCount: is the total number of times the work loop was run.
selectCount: is the total number of times that ares_fds() returned non-0.
didNoSelect: is the number of times that select() returned 0 (no file
descriptors are ready).
processedCount: is the number of times that select returned non-0 (ie, data
waiting).
queryCount: is the count from an internal object that contains all the
requests that are waiting for a reply (ie, in-flight) at the time the stats
were printed.
selectTVWasZero: was the number of times that ares_timeout() returned a 0+0
timeval.

It's fair to say that this pegged one of the two CPU's when it did this. :)
 There's obviously a bug, but I can no longer reproduce it.

The work loop runs on its own dedicated thread. The work loop is such that
if ares_fds() returns zero, the work loop goes to sleep on the equivalent of
a pthread_condition. A broadcast is sent to that pthread_condition any time
a query is made via ares_search() to wake up the background work loop
thread.
Received on 2009-10-30