Subject: Re: [c-ares] reinit implementation thoughts and questions

Re: [c-ares] reinit implementation thoughts and questions

From: Benbuck Nason <bnason_at_netflix.com>
Date: Mon, 15 Jun 2020 12:05:18 -0700

For what it's worth, we've internally been using roughly these steps for
years to re-initialize c-ares on network change events:

- Notify clients that have pending requests
- Destroy ares channels
- Destroy ares (ares_library_cleanup())
- Reset some internal bookkeeping
- Initialize ares (ares_library_init(), same as startup logic)
- Re-issue pending requests

We are externally tracking the network changes, which varies significantly
by platform.

-Benbuck

On Sat, Jun 13, 2020 at 6:34 PM Brad House <brad_at_brad-house.com> wrote:

> Correct, any patch should take other OS's into account. Even MacOS
> doesn't use resolv.conf these days. A linux-only solution is a
> non-starter. That said, I'd think at least initially any focus should be
> on an externally-callable re-init. Some systems, like android, have
> notifications for network changes that you might have to capture from Java,
> so there may never be a way to 100% implement a solution in c-ares for
> detecting network changes... however focusing on a user-callable re-init
> should be able to support all OS's with no additional effort over
> supporting a single OS.
>
> You'd definitely need to have a fair amount of understanding of the inner
> workings of c-ares to be able to properly implement a re-init without
> having unintended consequences. Especially if you have outstanding queries
> it would most certainly break things if you simply do what PR#272 is
> doing. Many servers use only a single c-ares channel that may basically
> *always* have outstanding queries, so a re-init that can only be called
> when there are no outstanding queries would also not be acceptable either.
>
> The most minimally invasive might be to queue any new requests while old
> requests process then reload the config when there is nothing outstanding
> and then start processing... but that's not a great solution if the DNS
> changed caused a condition where it is going to reach the max timeout all
> those queries would be on hold for a great deal of time. Terminating all
> outstanding queries (and processing those events) is also not ideal as it
> would throw errors back when maybe the change wouldn't have affected the
> inflight-queries. Maybe a combination of the 2 would be acceptable, allow
> a user-specifiable timeout for outstanding queries and block new requests.
>
> The most complex solution of course would be some sort of shadow structure
> where each query would use its own 'view' of the server configuration, and
> when the last query terminates, if that view is no longer the most current
> would be cleaned up. However, I'd imagine this would be fairly invasive,
> and would require a fair amount of review and would need good test cases to
> ensure it works properly. However, it would behave in the most expected
> manner with no caveats (short of any bugs of course).
>
> -Brad
>
> On 6/12/20 8:50 AM, Nayanā Topolsky wrote:
>
> Hi Nicolas,
>
> I would also like to see the official support of c-ares for detection of
> resolv configuration change.
> However the main blocker was that the implementation of my patch was not
> taking into account other OS types .. and more general solution was needed.
> That is not just specific for Linux/Mac and its resolv.conf - which is fair
> point.
>
> I am just a user, so this is more like opinion, but as I see it (correct
> me if I am wrong, or missing a point), it must be designed to take into
> account also other systems - from the very beginning.
> My solution is currently used at our company, and it suits our need -
> which is Linux specific.
> For instance I am not sure what exactly happens to the pending resolve
> requests when the change is detected and reinit is done.. and stuff like
> that.
> So it would be better to have real, verified solution - properly tested.
>
> Last thing - the support for resolv.conf change is already working in
> glibc.. and the inspiration was taken from that ..
> You can easily find the patch from 2017-07-03 ..
> In the patch I am checking ctime,mtime, size, inode ..
>
> Good luck,
>
> Nayanā
> Dňa 11. 6. 2020 o 20:48 Nicolas Sterchele napísal(a):
>
> Hi all,
>
> This is would be my first contribution to the library and before starting to code, I would like to have your opinion on the following points:
>
> I am thinking on implementing a "reinit" function based on the following closed PR (https://github.com/c-ares/c-ares/pull/272).
> My main concern is to trigger a "reinit" when the content of the resolv.conf file has changed.
>
> Ideas:
> 1. "ares_reinit" would be a public function, library consumers must have the possibility to trigger a reinit.
> 2. Is it possible to call the following functions "init_by_options", "init_by_env" or "init_by_resolv" without worrying to break
> something on a channel? Would like to reuse those functions...
> 3. Based on the resolvconf scenario, we should call "reinit" when the file's "mtime" has changed. In order to track a change, a new variable would be added to the "ares_channeldata" struct.
> 3.1 Each time "ares_gethostbyname" is triggered, we should check if the file has changed (by comparing what we have in "ares_channeldata" and the actual "mtime").
> 4. Having c-ares to trigger a reinit by itself should be optional, the consumer of the library must be aware of that.
>
> I look forward to have your feedback!
>
> Nicolas.
>
>
>
Received on 2020-06-15