Subject: Re: c-ares re-entrancy patch, 2006-09-30

Re: c-ares re-entrancy patch, 2006-09-30

From: William Ahern <william_at_25thandclement.com>
Date: Thu, 12 Oct 2006 11:42:20 -0700

On Thu, Oct 12, 2006 at 06:55:03PM +0200, Daniel Stenberg wrote:
> On Sat, 30 Sep 2006, William Ahern wrote:
>
> >I've updated the patch to make c-ares safe for re-entrant calls (e.g.
> >destroying a c-ares object/handle from a callback). The patch is against a
> >20060930 snapshot from CVS.
> >
> >http://25thandClement.com/~william/c-ares-reentrant-20060930.patch
>
> A question on the actual implementation: How can FRAME_OKAY() work
> reliably? I mean, it tries to detect if the channel is freed/destroyed,
> doesn't it?

The call_frame structure is an automatic variable, pushed onto a stack
stored in the ares channel structure. In ares_cancel() and ares_destroy()
the list is interated over and each structure flagged to notify the caller
that the channel context is now invalid. Think of it as a non-global conduit
to relay information back up the stack. In ares_process(), for instance, if
callbacks are being issued from a loop, before it tests the condition (and
dereferences any pointers), it checks the flag to see if the context is
still valid. If the context is bad, it returns immediately. And, again, its
safe to check the local structure because it's from automatic storage.

> I'm a bit reluctant to apply this patch, as I think the implementation is
> hairy and I don't see the big win.

If you're using your own event loop, I can understand where your coming
from. You simply delay calling ares_destroy() until you fall back into the
main loop again, where you know for sure there are no extant functions (like
ares_process()) with a reference to the channel. But, for instance, if
you're using libevent, and particularly if c-ares is sandwiched inside
another API, then you can't really provision for that. Delaying something
like that is such a headache as to make it a non-starter.

The real issue are all the loops in ares_process() and a few other
functions. In event oriented programming--at least in the style I'm most
familiar with, using callbacks rather than a churn mechanism (a la
OpenSSL)--whenever you call a callback function you should return
immediately. You can't really know what happened after that, for instance if
the dynamic objects you have are still valid when the callback returns. You
typically write all your [simple] algorithms recursively (functional
programming style), unless you're a glutton for punishment. Sometimes,
though, you want or need to iterate over things with intervening callbacks,
say for performance. When you do that, and depending on your circumstances,
you need to setup a mechanism to tell the function which is iterating
whether it's safe to continue.

That's what this patch does. I've used this technique in several other
projects, most notably libevnet (which uses libevent). It seems confusing,
and it does add some complexity to the innards of some components, but it
can make life much simpler for users of the library (c-ares). It just "does
the right thing", regardless of how the library is being used.

Anyhow, I didn't expect it to be integrated right now. In a few weeks I'll
have the patched version of c-ares running a production service and I'll be
able to say confidently say it doesn't have any egregious bugs. But, I *can*
say confidently right now that the technique works just fine.

In some places I tried to simply re-order things in such a way that the
technique wasn't needed in that particular scope. But, there's just so many
loops in critical sections... that's why the patch seems kinda large.
Received on 2006-10-12