ftrigrd bug

From: Olivier Brunel <>
Date: Tue, 5 Jan 2016 15:01:57 +0100

Hey Laurent,

I don't think I'm doing anything wrong (though if I am, please let me
know) and I'm having some issue with ftrigrd.

What I'm doing is subscribing on some fifodirs, sometimes without and
sometimes with the FTRIGRD_REPEAT flag. For those set, at some point I
unsubscribe. I also add new subscriptions along the way.
I don't think none of that is wrong/unexpected, but on occasion it
doesn't work properly. Specifically, I'll get EINVAL errors from
ftrigrd_update(), or even worse: EPIPE after s6-ftrigrd segfaulted.

It hasn't been very easy to track down and I'm still unsure of what's
causing the issue, many times it seemed like a race condition is at
play (or some ordering) since sometimes it would work, other times I
only get the EINVAL, and then sometimes it segfaults/lead to EPIPE.

As you may have guessed, this happens in anopa when starting services
and waiting for some events. I've tried to make a little test to
reproduce it, but as I said I'm not sure exactly what's the real cause.
It seems to be linked to both unsubscribing & subscribing, which
(sometimes) causes the issue.
E.g. in anopa, if I never unsubscribe it (seems to?) work fine, never
any error.

After many tries, I've come up with something that, at least here in
my latest tries, seems to have s6-ftrigrd segfault every time (I don't
get any EINVAL anymore first, straight to EPIPE, though again I couldn't
say why).

You can find the thing here[1], it's a small C file to compile, and a
folder "sd" which contains 4 servicedirs. To reproduce, you'll need to
start s6-svscan in that "sd" folder, then run the compile test, and
start all services -- I simply use:
for i in {1..4}; do s6-svc -u sd/foo$i; done

With luck, you should get something like this:
unsubscribing id#1
add some
unsubscribing id#2
unsubscribing id#3
(none): warning: unable to update: Broken pipe
(none): warning: unable to get event for id#0: Broken pipe
And s6-ftrigrd will have segfaulted.

I don't think there's anything wrong in the code, I even took care of
(un)subscribing only after all the ftrigr_{update,check} calls in case
that "messed up" the internals of ftrigr_t somehow, as I became unsure
of whether that was supported or not -- I think so, but if you could
confirm whether or not it actually is I'd appreciate, thanks.

Anyhow, this seems to be reproducable every time for me now, but if not
at first try it a few times, it should happen. Hopefully you can figure
out what's going on & fix it.

I tried to get a backtrace from s6-ftrigrd coredump, but never could get
anything usefull (I should have compiled both skalibs & s6 with
debug symbols in for s6-ftrigrd, but I obviously didn't do it right,
or missed something).

I get either this:
(gdb) bt full
#0 0x0000000000406bff in free ()
No symbol table info available.
#1 0x0000000000000100 in ?? ()
No symbol table info available.
#2 0x0000000000000000 in ?? ()
No symbol table info available.

Or a slightly longer one, starting with:
#0 0x0000000000407722 in realloc ()

Let me know if there's anything else I can do to help,

Received on Tue Jan 05 2016 - 14:01:57 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:38:49 UTC