Re: [PATCH] s6-rc-update: avoid getrandom(2)

From: Rasmus Villemoes <rasmus.villemoes_at_prevas.dk>
Date: Tue, 3 Oct 2017 11:32:22 +0200

On 2017-10-03 11:11, Laurent Bercot wrote:
>> Do you plan to do a bugfix release soon'ish?
>
> I can do one some time later this week if you need it. However, if
> it's not urgent,

It's not that urgent, but it would be nice to have a rough idea of when
you might roll the release, and preferable within, say, a few weeks.

> I'd like some time to investigate the s6-rc-init crash
> you reported, if it's still happening for you. I cannot reproduce it
> so far.

Well, I haven't reproduced it either with the new s6* versions, but I
had to go back and use the old versions for some unrelated bug-hunting
on the BSP, and then I hit SIGSEGV in s6-rc-init. Don't know if it's any
use to you since the versions are so old, but the crash seems to be very
near to the one I reported.

s6-rc: 0.0.2.1
s6: 2.2.4.3
skalibs: 2.3.9.0

Anyway, an strace -f of it happening finishes with

351 ppoll([{fd=4, events=POLLIN|POLLHUP}], 1, NULL, NULL, 8
<unfinished ...>
352 <... sendmsg resumed> ) = 7
352 ppoll([{fd=0, events=POLLIN|POLLERR|POLLHUP|POLLNVAL}, {fd=1,
events=POLLERR|POLLHUP|POLLNVAL}, {fd=5,
events=POLLERR|POLLHUP|POLLNVAL}, {fd=6, events=POLLIN|POLLHUP}, {fd=8,
events=POLLIN|POLLHUP}, {fd=10, events=POLLIN|POLLHUP}, {fd=12,
events=POLLIN|POLLHUP}, {fd=14, events=POLLIN|POLLHUP}, {fd=16,
events=POLLIN|POLLHUP}, {fd=18, events=POLLIN|POLLHUP}, {fd=20,
events=POLLIN|POLLHUP}, {fd=22, events=POLLIN|POLLHUP}, {fd=24,
events=POLLIN|POLLHUP}, {fd=26, events=POLLIN|POLLHUP}, {fd=28,
events=POLLIN|POLLHUP}, {fd=30, events=POLLIN|POLLHUP}, {fd=32,
events=POLLIN|POLLHUP}, {fd=34, events=POLLIN|POLLHUP}, {fd=36,
events=POLLIN|POLLHUP}, {fd=38, events=POLLIN|POLLHUP}, {fd=40,
events=POLLIN|POLLHUP}, {fd=42, events=POLLIN|POLLHUP}, {fd=44,
events=POLLIN|POLLHUP}, {fd=46, events=POLLIN|POLLHUP}, {fd=48,
events=POLLIN|POLLHUP}, {fd=50, events=POLLIN|POLLHUP}, {fd=52,
events=POLLIN|POLLHUP}, {fd=54, events=POLLIN|POLLHUP}, {fd=56,
events=POLLIN|POLLHUP}, {fd=58, events=POLLIN|POLLHUP}, {fd=60,
events=POLLIN|POLLHUP}, {fd=62, events=POLLIN|POLLHUP}, ...], 43, NULL,
NULL, 8) = 1 ([{fd=0, revents=POLLIN}])
352 recvmsg(0, {msg_name=NULL, msg_namelen=0,
msg_iov=[{iov_base="\0\0\0008\0\0\0(L\0\0\0\0\0\0\0&\0\0\0\1/run/rc/ser"...,
iov_len=1678}, {iov_base="", iov_len=369}], msg_iovlen=2,
msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC},
MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = 62
352 umask(000) = 022
352
mknod("/run/rc/servicedirs/ifplugd_at_eth1/event/.ftrig1:@4000000059d22b9d00aad5d4:QJSa7u81XdXzlWPG",
S_IFIFO|0622) = 0
352 umask(022) = 000
352
open("/run/rc/servicedirs/ifplugd_at_eth1/event/.ftrig1:@4000000059d22b9d00aad5d4:QJSa7u81XdXzlWPG",
O_RDONLY|O_NONBLOCK) = 86
352
open("/run/rc/servicedirs/ifplugd_at_eth1/event/.ftrig1:@4000000059d22b9d00aad5d4:QJSa7u81XdXzlWPG",
O_WRONLY|O_NONBLOCK) = 87
352
rename("/run/rc/servicedirs/ifplugd_at_eth1/event/.ftrig1:@4000000059d22b9d00aad5d4:QJSa7u81XdXzlWPG",
"/run/rc/servicedirs/ifplugd_at_eth1/event/ftrig1:@4000000059d22b9d00aad5d4:QJSa7u81XdXzlWPG")
= 0
352 recvmsg(0, {msg_namelen=0},
MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource
temporarily unavailable)
352 ppoll([{fd=0, events=POLLIN|POLLERR|POLLHUP|POLLNVAL}, {fd=1,
events=POLLOUT|POLLERR|POLLHUP|POLLNVAL}, {fd=5,
events=POLLERR|POLLHUP|POLLNVAL}, {fd=6, events=POLLIN|POLLHUP}, {fd=8,
events=POLLIN|POLLHUP}, {fd=10, events=POLLIN|POLLHUP}, {fd=12,
events=POLLIN|POLLHUP}, {fd=14, events=POLLIN|POLLHUP}, {fd=16,
events=POLLIN|POLLHUP}, {fd=18, events=POLLIN|POLLHUP}, {fd=20,
events=POLLIN|POLLHUP}, {fd=22, events=POLLIN|POLLHUP}, {fd=24,
events=POLLIN|POLLHUP}, {fd=26, events=POLLIN|POLLHUP}, {fd=28,
events=POLLIN|POLLHUP}, {fd=30, events=POLLIN|POLLHUP}, {fd=32,
events=POLLIN|POLLHUP}, {fd=34, events=POLLIN|POLLHUP}, {fd=36,
events=POLLIN|POLLHUP}, {fd=38, events=POLLIN|POLLHUP}, {fd=40,
events=POLLIN|POLLHUP}, {fd=42, events=POLLIN|POLLHUP}, {fd=44,
events=POLLIN|POLLHUP}, {fd=46, events=POLLIN|POLLHUP}, {fd=48,
events=POLLIN|POLLHUP}, {fd=50, events=POLLIN|POLLHUP}, {fd=52,
events=POLLIN|POLLHUP}, {fd=54, events=POLLIN|POLLHUP}, {fd=56,
events=POLLIN|POLLHUP}, {fd=58, events=POLLIN|POLLHUP}, {fd=60,
events=POLLIN|POLLHUP}, {fd=62, events=POLLIN|POLLHUP}, ...], 44, NULL,
NULL, 8) = 1 ([{fd=1, revents=POLLOUT}])
352 sendmsg(1, {msg_name=NULL, msg_namelen=0,
msg_iov=[{iov_base="\0\0\0\1\0\0", iov_len=6}, {iov_base="\0",
iov_len=1}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL
<unfinished ...>
351 <... ppoll resumed> ) = 1 ([{fd=4, revents=POLLIN}])
351 recvmsg(4, {msg_name=NULL, msg_namelen=0,
msg_iov=[{iov_base="\0\0\0\1\0\0\0", iov_len=1746}, {iov_base="",
iov_len=301}], msg_iovlen=2, msg_controllen=0,
msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = 7
351 recvmsg(4, {msg_namelen=0},
MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource
temporarily unavailable)
351 symlink("/run/rc/servicedirs/ifplugd_at_eth1",
"/run/rc/scandir/ifplugd_at_eth1") = 0
351 getdents(6, /* 0 entries */, 32768) = 0
351 close(6) = 0
351 open("/run/rc/scandir/.s6-svscan/control", O_WRONLY|O_NONBLOCK) = 6
351 fcntl64(6, F_GETFL) = 0x801 (flags O_WRONLY|O_NONBLOCK)
351 fcntl64(6, F_SETFL, O_WRONLY) = 0
351 write(6, "a", 1) = 1
351 close(6) = 0
351 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xc} ---

(followed by the 352 child cleaning up),
and the bad 0xc address was also seen in a valgrind run:

# valgrind s6-rc-init -c /var/rc/compiled /run/service
==315== Memcheck, a memory error detector
==315== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==315== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==315== Command: s6-rc-init -c /var/rc/compiled /run/service
==315==
==315== Invalid read of size 4
==315== at 0x484CEC0: memmove (vg_replace_strmem.c:1258)
==315== by 0x48AAF0B: stralloc_catb (in /usr/lib/libskarnet.so.2.3.9.0)
==315== Address 0x4a09068 is 0 bytes inside a block of size 24 free'd
==315== at 0x48481BC: realloc (vg_replace_malloc.c:785)
==315== by 0x489FB9F: alloc_realloc (in /usr/lib/libskarnet.so.2.3.9.0)
==315== Block was alloc'd at
==315== at 0x48481BC: realloc (vg_replace_malloc.c:785)
==315== by 0x489FB9F: alloc_realloc (in /usr/lib/libskarnet.so.2.3.9.0)
==315==
==315== Invalid read of size 4
==315== at 0x484CED4: memmove (vg_replace_strmem.c:1258)
==315== by 0x48AAF0B: stralloc_catb (in /usr/lib/libskarnet.so.2.3.9.0)
==315== Address 0x4a09070 is 8 bytes inside a block of size 24 free'd
==315== at 0x48481BC: realloc (vg_replace_malloc.c:785)
==315== by 0x489FB9F: alloc_realloc (in /usr/lib/libskarnet.so.2.3.9.0)
==315== Block was alloc'd at
==315== at 0x48481BC: realloc (vg_replace_malloc.c:785)
==315== by 0x489FB9F: alloc_realloc (in /usr/lib/libskarnet.so.2.3.9.0)
==315==
==315== Invalid read of size 2
==315== at 0x484CF20: memmove (vg_replace_strmem.c:1258)
==315== by 0x48AAF0B: stralloc_catb (in /usr/lib/libskarnet.so.2.3.9.0)
==315== Address 0x4a09074 is 12 bytes inside a block of size 24 free'd
==315== at 0x48481BC: realloc (vg_replace_malloc.c:785)
==315== by 0x489FB9F: alloc_realloc (in /usr/lib/libskarnet.so.2.3.9.0)
==315== Block was alloc'd at
==315== at 0x48481BC: realloc (vg_replace_malloc.c:785)
==315== by 0x489FB9F: alloc_realloc (in /usr/lib/libskarnet.so.2.3.9.0)
==315==
==315== Invalid read of size 1
==315== at 0x484CF58: memmove (vg_replace_strmem.c:1258)
==315== by 0x48AAF0B: stralloc_catb (in /usr/lib/libskarnet.so.2.3.9.0)
==315== Address 0x4a09076 is 14 bytes inside a block of size 24 free'd
==315== at 0x48481BC: realloc (vg_replace_malloc.c:785)
==315== by 0x489FB9F: alloc_realloc (in /usr/lib/libskarnet.so.2.3.9.0)
==315== Block was alloc'd at
==315== at 0x48481BC: realloc (vg_replace_malloc.c:785)
==315== by 0x489FB9F: alloc_realloc (in /usr/lib/libskarnet.so.2.3.9.0)
==315==
==315== Invalid read of size 4
==315== at 0x4875720: ftrigr_check (in /usr/lib/libs6.so.2.2.4.3)
==315== Address 0xc is not stack'd, malloc'd or (recently) free'd
==315==
==315==
==315== Process terminating with default action of signal 11 (SIGSEGV)
==315== Access not within mapped region at address 0xC
==315== at 0x4875720: ftrigr_check (in /usr/lib/libs6.so.2.2.4.3)
==315== If you believe this happened as a result of a stack
==315== overflow in your program's main thread (unlikely but
==315== possible), you can try to increase the size of the
==315== main thread stack using the --main-stacksize= flag.
==315== The main thread stack size used in this run was 8388608.
==315==
==315== HEAP SUMMARY:
==315== in use at exit: 1,959 bytes in 9 blocks
==315== total heap usage: 109 allocs, 100 frees, 1,815,779 bytes allocated
==315==
==315== LEAK SUMMARY:
==315== definitely lost: 0 bytes in 0 blocks
==315== indirectly lost: 0 bytes in 0 blocks
==315== possibly lost: 0 bytes in 0 blocks
==315== still reachable: 1,959 bytes in 9 blocks
==315== suppressed: 0 bytes in 0 blocks
==315== Rerun with --leak-check=full to see details of leaked memory
==315==
==315== For counts of detected and suppressed errors, rerun with: -v
==315== ERROR SUMMARY: 6 errors from 5 contexts (suppressed: 0 from 0)
Segmentation fault

The first invalid reads are from the use-after-free which also existed
back then (livedir is /run/rc, so it's copying the 15-byte string
/run/rc:initial), but the thing that triggers the SIGSEGV is the last
one. After adding -g to CFLAGS I haven't been able to reproduce this one
either, so I can't be more specific than this.

-- 
Rasmus Villemoes
Software Developer
Prevas A/S
Hedeager 3
DK-8200 Aarhus N
+45 51210274
rasmus.villemoes_at_prevas.dk
www.prevas.dk
Received on Tue Oct 03 2017 - 09:32:22 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:38:49 UTC