Subject: Re: New release?

Re: New release?

From: Steinar H. Gunderson <sesse_at_google.com>
Date: Thu, 29 Oct 2009 12:22:28 +0100

On Thu, Oct 29, 2009 at 02:15:11AM -0400, John Engelhart wrote:
> Fair enough. Could you elaborate a bit more on this? ares_timeouts() still
> uses a linear scan of the list, so is this function not in your critical
> path?

I'm afraid I don't know the details. I could try digging up the people who
did this in the first place if you're interested.

> I'm not saying you're lying or making things up, objective data is what it
> is. It's just strikes me as a bit unusual. Two possible explanations I can
> think of is you have a HUGE amount of queries (>10,000), but this seems
> unlikely, or the list is being scanned very, very frequently.

We do have a huge amount of queries. :-) I don't think I could give you the
exact number even if I knew it, though -- sorry for being so secretive about
it.

> When I noticed my problem, c-ares was obviously a hot spot in the profiling
> tools. ares_timeout(), ares_process(), and all the related bits and pieces
> were all heavy hitters. ares_process() was being called on the order of
> 500,000 times/sec. So an inefficient process_timeouts() under these
> conditions is likely to stand out, but the root of my problem turned out to
> be that ares_timeout() was returning a 0+0 wait time for select(). Since
> I'm developing on a Mac, I happen to use Menu Meters with a CPU graph. It
> just happens to show user and system time in different colors. In a bit of
> serendipitous luck, I noticed that the system time went right through the
> roof during the problem, so I immediately suspected some kind of
> select()'ish kind of problem even though the profiling tools were telling me
> something else (user-land centric and didn't account for the kernel time).

Yes, busywaiting in ares_process() is obviously a bad thing. I'm not sure how
it's related to these linked lists, though?

> And, hypothetically speaking, if someone were to re-work the code, could you
> provide feedback on whether or not the new code "works"?

I'm pretty sure that would be after-the-fact, unfortunately, as in -- at each
new release I import the new version into our local repository, and if
something goes awry eventually some angry engineer will find me and wonder
what the heck I've imported. :-)

> Sadly, the query that was causing me problems now works just fine (returns
> 'instantly'). And it's always hard to test if you don't have a reliable
> way to reproduce the problem. :)

Yes. You can fake quite a lot with some iptables, though.

/* Steinar */

-- 
Software Engineer, Google Switzerland
Received on 2009-10-29