-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.7: test suite fails in two units #762
Comments
There are no changes in the Python code between v1.6 and v1.7. Though we had some refactoring in the cleanup code in tree.c. So it might be that ba533de ("tree: use cleanup functions") introduces this regression. (gdb) bt
#0 0x00007ffff7691dec in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007ffff763f0c6 in raise () from /lib64/libc.so.6
#2 0x00007ffff76268d7 in abort () from /lib64/libc.so.6
#3 0x00007ffff7875453 in corrupt (abortstr=0x7ffff78778b0 "../ccan/ccan/list/list.h:453", head=0x55555599ab70, node=0x55555598d6e0, count=1) at ../ccan/ccan/list/list.c:15
#4 0x00007ffff78754ba in list_check_node (node=0x55555599ab70, abortstr=0x7ffff78778b0 "../ccan/ccan/list/list.h:453") at ../ccan/ccan/list/list.c:29
#5 0x00007ffff7875537 in list_check (h=0x55555599ab70, abortstr=0x7ffff78778b0 "../ccan/ccan/list/list.h:453") at ../ccan/ccan/list/list.c:40
#6 0x00007ffff7869bd9 in list_empty_ (h=0x55555599ab70, abortstr=0x7ffff78778b0 "../ccan/ccan/list/list.h:453") at ../ccan/ccan/list/list.h:269
#7 0x00007ffff7869cb7 in list_top_ (h=0x55555599ab70, off=0) at ../ccan/ccan/list/list.h:453
#8 0x00007ffff786c71d in nvme_ctrl_first_ns (c=0x55555599ab50) at ../src/nvme/tree.c:1084
#9 0x00007ffff786ccb1 in __nvme_free_ctrl (c=0x55555599ab50) at ../src/nvme/tree.c:1157
#10 0x00007ffff786ae30 in __nvme_free_subsystem (s=0x555555667f40) at ../src/nvme/tree.c:482
#11 0x00007ffff786b31f in __nvme_free_host (h=0x555555954740) at ../src/nvme/tree.c:577
#12 0x00007ffff786aa1a in nvme_free_tree (r=0x555555752a10) at ../src/nvme/tree.c:367
#13 0x00007ffff789411d in delete_nvme_root (self=0x555555752a10) at libnvme/nvme_wrap.c:3295
#14 0x00007ffff7898b46 in _wrap_delete_root (self=0x5555559beb60, args=0x5555558aa510) at libnvme/nvme_wrap.c:5007
#15 0x00007ffff7891faa in SwigPyObject_dealloc (v=0x5555558aa510) at libnvme/nvme_wrap.c:1823
#16 0x00007ffff7bf3332 in ?? () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#17 0x00007ffff7bf3311 in ?? () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#18 0x00007ffff7ba8b27 in ?? () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#19 0x00007ffff7bbef81 in _PyEval_EvalFrameDefault () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#20 0x00007ffff7bb6d3a in ?? () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#21 0x00007ffff7c3932f in PyEval_EvalCode () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#22 0x00007ffff7c56513 in ?? () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#23 0x00007ffff7c52c0a in ?? () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#24 0x00007ffff7c68922 in ?? () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#25 0x00007ffff7c681f4 in _PyRun_SimpleFileObject () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#26 0x00007ffff7c67e14 in _PyRun_AnyFileObject () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#27 0x00007ffff7c617b8 in Py_RunMain () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#28 0x00007ffff7c29597 in Py_BytesMain () from /lib64/glibc-hwcaps/x86-64-v3/libpython3.11.so.1.0
#29 0x00007ffff76281b0 in __libc_start_call_main () from /lib64/libc.so.6
#30 0x00007ffff7628279 in __libc_start_main_impl () from /lib64/libc.so.6
#31 0x0000555555555085 in _start () FYI, @calebsander @martin-belanger (and I really wonder why we didn't catch this earlier) |
I confirm that I'm seeing the same issue when running Here's a portion of the stack trace.
|
@igaw - Here's a little more info. When executing the script
However, when defining the environment variable
Our first reaction would be to remove |
Found the problem. Here is the fix: --- a/src/nvme/tree.c
+++ b/src/nvme/tree.c
@@ -2473,7 +2473,7 @@ static int nvme_ns_init(const char *path, struct nvme_ns *ns)
ret = nvme_ns_identify(ns, id);
if (ret)
- free(ns);
+ return ret;
nvme_id_ns_flbas_to_lbaf_inuse(id->flbas, &flbas);
ns->lba_count = le64_to_cpu(id->nsze); So it was my sysfs patch which broke it :( |
is because we don't open the fs on default anymore. Instead we open the device file on request with |
The build container seem to miss the Python devel libraries... |
@igaw - I tried your fix with |
I think the reason we see this issue while testing Python code is because Python has a garbage collector that tries to clean up every possible object on exit. And we get a double-free error (or some other memory-related issue) if something has already been deleted by the time the GC gets called. It's different with C code. If we don't explicitly call the clean up functions on exit (e.g. |
I've tested #763 and still I have fails warning: Downloading https://github.com/linux-nvme/libnvme//archive/v1.7/libnvme-1.7.tar.gz to /home/tkloczko/rpmbuild/SOURCES/libnvme-1.7.tar.gz
warning: Downloading https://github.com/linux-nvme/libnvme//pull/763.patch#/libnvme-Fix-memory-corruption-in-tree.c.patch to /home/tkloczko/rpmbuild/SOURCES/libnvme-Fix-memory-corruption-in-tree.c.patch
warning: source_date_epoch_from_changelog set but %changelog is missing
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.imMd3X
+ umask 022
+ cd /home/tkloczko/rpmbuild/BUILD
+ cd /home/tkloczko/rpmbuild/BUILD
+ rm -rf libnvme-1.7
+ /usr/lib/rpm/rpmuncompress -x /home/tkloczko/rpmbuild/SOURCES/libnvme-1.7.tar.gz
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd libnvme-1.7
+ rm -rf /home/tkloczko/rpmbuild/BUILD/libnvme-1.7/SPECPARTS
+ /usr/bin/mkdir -p /home/tkloczko/rpmbuild/BUILD/libnvme-1.7/SPECPARTS
+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ /usr/lib/rpm/rpmuncompress /home/tkloczko/rpmbuild/SOURCES/libnvme-Fix-memory-corruption-in-tree.c.patch
+ /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f
[..]
+ cd libnvme-1.7
+ /usr/bin/meson test -C x86_64-redhat-linux-gnu --num-processes 48 --print-errorlogs
ninja: Entering directory `/home/tkloczko/rpmbuild/BUILD/libnvme-1.7/x86_64-redhat-linux-gnu'
ninja: no work to do.
1/21 python-import-libnvme OK 0.13s
2/21 mi OK 0.11s
3/21 mi-mctp OK 0.10s
4/21 uuid OK 0.09s
5/21 tree OK 0.09s
6/21 util OK 0.09s
7/21 features OK 0.07s
8/21 identify OK 0.07s
9/21 NBFT-auto-ipv6 OK 0.06s
10/21 NBFT-dhcp-ipv6 OK 0.06s
11/21 NBFT-rhpoc OK 0.05s
12/21 NBFT-static-ipv4 OK 0.05s
13/21 NBFT-static-ipv4-discovery OK 0.04s
14/21 NBFT-static-ipv6 OK 0.04s
15/21 python-read-nbft-file OK 0.12s
16/21 discovery OK 0.09s
17/21 NBFT-bad-oldspec EXPECTEDFAIL 0.03s exit status 2
18/21 NBFT-random-noise EXPECTEDFAIL 0.03s exit status 2
19/21 python-create-ctrl-object FAIL 0.15s killed by signal 6 SIGABRT
>>> UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MALLOC_PERTURB_=0 LD_LIBRARY_PATH=/home/tkloczko/rpmbuild/BUILD/libnvme-1.7/x86_64-redhat-linux-gnu/src:/home/tkloczko/rpmbuild/BUILD/libnvme-1.7/x86_64-redhat-linux-gnu/libnvme ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 PYTHONPATH=/home/tkloczko/rpmbuild/BUILD/libnvme-1.7/x86_64-redhat-linux-gnu/libnvme/.. PYTHONMALLOC=malloc /usr/bin/python3 /home/tkloczko/rpmbuild/BUILD/libnvme-1.7/x86_64-redhat-linux-gnu/../libnvme/tests/create-ctrl-obj.py
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
Failed to open ns nvme0n1, errno 2
malloc(): unsorted double linked list corrupted
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
20/21 python-sigsegv-during-gc FAIL 0.15s killed by signal 6 SIGABRT
>>> UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MALLOC_PERTURB_=0 LD_LIBRARY_PATH=/home/tkloczko/rpmbuild/BUILD/libnvme-1.7/x86_64-redhat-linux-gnu/src:/home/tkloczko/rpmbuild/BUILD/libnvme-1.7/x86_64-redhat-linux-gnu/libnvme ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 PYTHONPATH=/home/tkloczko/rpmbuild/BUILD/libnvme-1.7/x86_64-redhat-linux-gnu/libnvme/.. PYTHONMALLOC=malloc /usr/bin/python3 /home/tkloczko/rpmbuild/BUILD/libnvme-1.7/x86_64-redhat-linux-gnu/../libnvme/tests/gc.py
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
Failed to open ns nvme0n1, errno 2
malloc(): unsorted double linked list corrupted
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
21/21 kdoc OK 0.36s
Summary of Failures:
19/21 python-create-ctrl-object FAIL 0.15s killed by signal 6 SIGABRT
20/21 python-sigsegv-during-gc FAIL 0.15s killed by signal 6 SIGABRT
Ok: 17
Expected Fail: 2
Fail: 2
Unexpected Pass: 0
Skipped: 0
Timeout: 0 |
@kloczek - Note that #763 does not actually contain the fix (copied below), but instead adds support for Python in the unit tests. I think that the actual fix will come as a separate commit.
|
Sorry, didn't finish the PR yesterday. I was trying to reproduce it on our CI build first. Though it depends a bit on Python's GC behavior. |
Ah, I think it depends on a physical nvme device to present in the test system. Because we will not execute the init path when there is no device available thus we will not try to free the |
@kloczek I've updated the PR and contains the fix now. |
1.6 in the same build env waa OK
The text was updated successfully, but these errors were encountered: