-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
Type: DefectIncorrect behavior (e.g. crash, hang)Incorrect behavior (e.g. crash, hang)
Description
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Debian |
| Distribution Version | N/A |
| Kernel Version | 5.15.189 |
| Architecture | x86_64 |
| OpenZFS Version | 2.3.5 |
Describe the problem you're observing
Kernel panic occurs during ZFS module unload (rmmod zfs) with NULL pointer dereference in spl_kmem_cache_free(). The crash happens when destroying zfs_znode_cache.
Describe how to reproduce the problem
No 100% reliable reproduction scenario identified. The issue is timing-dependent and occurs most frequently under the following conditions:
- Large pool with many files/inodes
- Significant filesystem activity before unmount
- Executing
rmmod zfsshortly afterzfs umount - System shutdown/reboot scenarios
Userspace workarounds (sync, drop_caches, sleep) do not reliably prevent the crash.
Include any warning/errors/backtraces from the system logs
[ 8070.222528] =============================================================================
[ 8070.222538] BUG zfs_znode_cache (Tainted: P O ): Objects remaining in zfs_znode_cache on __kmem_cache_shutdown()
[ 8070.222553] -----------------------------------------------------------------------------
[ 8070.222554] Slab 0x00000000d74cad89 objects=29 used=1 fp=0x000000008340d8f5 flags=0x17fff0000010200(slab|head|node=0|zone=2|lastcpupid=0x7fff)
[ 8070.222561] CPU: 15 PID: 46268 Comm: rmmod Kdump: loaded Tainted: P O 5.15.189 #4 99eff10b7a15abd00a5f147b42deac89a210e1eb
[ 8070.222564] Hardware name: Intel Corporation S2600IP/S2600IP, BIOS SE5C600.86B.02.06.0007.082420181029 08/24/2018
[ 8070.222566] Call Trace:
[ 8070.222570] <TASK>
[ 8070.222572] dump_stack_lvl+0x45/0x5b
[ 8070.222578] slab_err+0x94/0xcb
[ 8070.222582] ? cpumask_next+0x1e/0x30
[ 8070.222586] __kmem_cache_shutdown.cold+0x50/0x1bf
[ 8070.222590] kmem_cache_destroy+0x45/0xe0
[ 8070.222596] spl_kmem_cache_destroy+0x15a/0x1d0 [spl 13f8907e7fb151d3ffcbbcbd8bb82336a42a609c]
[ 8070.222605] ? synchronize_rcu+0x62/0x80
[ 8070.222611] ? invoke_rcu_core+0xa0/0xa0
[ 8070.222613] zfs_znode_fini+0x16/0x50 [zfs d67f7c9af4a662e93db1055a0ba77b4262911192]
[ 8070.222744] zfs_fini+0x2e/0x40 [zfs d67f7c9af4a662e93db1055a0ba77b4262911192]
[ 8070.222847] zfs_kmod_fini+0x67/0xc0 [zfs d67f7c9af4a662e93db1055a0ba77b4262911192]
[ 8070.222952] openzfs_fini+0xa/0x5ee [zfs d67f7c9af4a662e93db1055a0ba77b4262911192]
[ 8070.223058] __do_sys_delete_module+0x1a5/0x250
[ 8070.223062] ? exit_to_user_mode_prepare+0x30/0x180
[ 8070.223064] do_syscall_64+0x3c/0x90
[ 8070.223069] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[ 8070.223074] RIP: 0033:0x7f7503091b17
[ 8070.223076] Code: 73 01 c3 48 8b 0d 71 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 c3 2b 00 f7 d8 64 89 01 48
[ 8070.223078] RSP: 002b:00007fff2ebf0978 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 8070.223080] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7503091b17
[ 8070.223082] RDX: 00007f75030f7c60 RSI: 0000000000000800 RDI: 00005647f46d33e0
[ 8070.223083] RBP: 00005647f46d3380 R08: 00007f750334ef20 R09: 00007fff2ebef8f1
[ 8070.223084] R10: 00007fff2ebf0740 R11: 0000000000000206 R12: 00007fff2ebf0ba0
[ 8070.223085] R13: 00007fff2ebf0e5d R14: 0000000000000000 R15: 00005647f46d3380
[ 8070.223087] </TASK>
[ 8070.223090] Object 0x0000000019505859 @offset=0
[ 8070.223092] =============================================================================
[ 8070.223092] BUG zfs_znode_cache (Tainted: P B O ): Objects remaining in zfs_znode_cache on __kmem_cache_shutdown()
[ 8070.223093] -----------------------------------------------------------------------------
[ 8070.223094] Slab 0x00000000b186fa5b objects=29 used=1 fp=0x000000004e12d82f flags=0x17fff0000010200(slab|head|node=0|zone=2|lastcpupid=0x7fff)
[ 8070.223096] CPU: 15 PID: 46268 Comm: rmmod Kdump: loaded Tainted: P B O 5.15.189 #4 99eff10b7a15abd00a5f147b42deac89a210e1eb
[ 8070.223098] Hardware name: Intel Corporation S2600IP/S2600IP, BIOS SE5C600.86B.02.06.0007.082420181029 08/24/2018
[ 8070.223099] Call Trace:
[ 8070.223100] <TASK>
[ 8070.223100] dump_stack_lvl+0x45/0x5b
[ 8070.223102] slab_err+0x94/0xcb
[ 8070.223104] ? _printk+0x58/0x73
[ 8070.223108] ? cpumask_next+0x1e/0x30
[ 8070.223110] __kmem_cache_shutdown.cold+0x50/0x1bf
[ 8070.223112] kmem_cache_destroy+0x45/0xe0
[ 8070.223114] spl_kmem_cache_destroy+0x15a/0x1d0 [spl 13f8907e7fb151d3ffcbbcbd8bb82336a42a609c]
[ 8070.223121] ? synchronize_rcu+0x62/0x80
[ 8070.223123] ? invoke_rcu_core+0xa0/0xa0
[ 8070.223125] zfs_znode_fini+0x16/0x50 [zfs d67f7c9af4a662e93db1055a0ba77b4262911192]
[ 8070.223227] zfs_fini+0x2e/0x40 [zfs d67f7c9af4a662e93db1055a0ba77b4262911192]
[ 8070.223328] zfs_kmod_fini+0x67/0xc0 [zfs d67f7c9af4a662e93db1055a0ba77b4262911192]
[ 8070.223433] openzfs_fini+0xa/0x5ee [zfs d67f7c9af4a662e93db1055a0ba77b4262911192]
[ 8070.223538] __do_sys_delete_module+0x1a5/0x250
[ 8070.223539] ? exit_to_user_mode_prepare+0x30/0x180
[ 8070.223541] do_syscall_64+0x3c/0x90
[ 8070.223543] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[ 8070.223545] RIP: 0033:0x7f7503091b17
[ 8070.223547] Code: 73 01 c3 48 8b 0d 71 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 c3 2b 00 f7 d8 64 89 01 48
[ 8070.223548] RSP: 002b:00007fff2ebf0978 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 8070.223550] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7503091b17
[ 8070.223551] RDX: 00007f75030f7c60 RSI: 0000000000000800 RDI: 00005647f46d33e0
[ 8070.223552] RBP: 00005647f46d3380 R08: 00007f750334ef20 R09: 00007fff2ebef8f1
[ 8070.223553] R10: 00007fff2ebf0740 R11: 0000000000000206 R12: 00007fff2ebf0ba0
[ 8070.223555] R13: 00007fff2ebf0e5d R14: 0000000000000000 R15: 00005647f46d3380
[ 8070.223556] </TASK>
[ 8070.223559] Object 0x000000001f054720 @offset=0
[ 8070.223560] kmem_cache_destroy zfs_znode_cache: Slab cache still has objects
[ 8070.223561] CPU: 15 PID: 46268 Comm: rmmod Kdump: loaded Tainted: P B O 5.15.189 #4 99eff10b7a15abd00a5f147b42deac89a210e1eb
[ 8070.223564] Hardware name: Intel Corporation S2600IP/S2600IP, BIOS SE5C600.86B.02.06.0007.082420181029 08/24/2018
[ 8070.223565] Call Trace:
[ 8070.223565] <TASK>
[ 8070.223566] dump_stack_lvl+0x45/0x5b
[ 8070.223568] kmem_cache_destroy.cold+0x1c/0x21
[ 8070.223570] spl_kmem_cache_destroy+0x15a/0x1d0 [spl 13f8907e7fb151d3ffcbbcbd8bb82336a42a609c]
[ 8070.223576] ? synchronize_rcu+0x62/0x80
[ 8070.223578] ? invoke_rcu_core+0xa0/0xa0
[ 8070.223580] zfs_znode_fini+0x16/0x50 [zfs d67f7c9af4a662e93db1055a0ba77b4262911192]
[ 8070.223682] zfs_fini+0x2e/0x40 [zfs d67f7c9af4a662e93db1055a0ba77b4262911192]
[ 8070.223784] zfs_kmod_fini+0x67/0xc0 [zfs d67f7c9af4a662e93db1055a0ba77b4262911192]
[ 8070.223889] openzfs_fini+0xa/0x5ee [zfs d67f7c9af4a662e93db1055a0ba77b4262911192]
[ 8070.224014] __do_sys_delete_module+0x1a5/0x250
[ 8070.224016] ? exit_to_user_mode_prepare+0x30/0x180
[ 8070.224017] do_syscall_64+0x3c/0x90
[ 8070.224019] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[ 8070.224022] RIP: 0033:0x7f7503091b17
[ 8070.224024] Code: 73 01 c3 48 8b 0d 71 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 c3 2b 00 f7 d8 64 89 01 48
[ 8070.224025] RSP: 002b:00007fff2ebf0978 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 8070.224027] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7503091b17
[ 8070.224028] RDX: 00007f75030f7c60 RSI: 0000000000000800 RDI: 00005647f46d33e0
[ 8070.224029] RBP: 00005647f46d3380 R08: 00007f750334ef20 R09: 00007fff2ebef8f1
[ 8070.224030] R10: 00007fff2ebf0740 R11: 0000000000000206 R12: 00007fff2ebf0ba0
[ 8070.224031] R13: 00007fff2ebf0e5d R14: 0000000000000000 R15: 00005647f46d3380
[ 8070.224032] </TASK>
[ 8070.232300] BUG: kernel NULL pointer dereference, address: 0000000000000028
[ 8070.232303] #PF: supervisor read access in kernel mode
[ 8070.232305] #PF: error_code(0x0000) - not-present page
[ 8070.232307] PGD 0 P4D 0
[ 8070.232310] Oops: 0000 [#1] SMP PTI
[ 8070.232312] CPU: 14 PID: 85 Comm: ksoftirqd/14 Kdump: loaded Tainted: P B O 5.15.189 #4 99eff10b7a15abd00a5f147b42deac89a210e1eb
[ 8070.232315] Hardware name: Intel Corporation S2600IP/S2600IP, BIOS SE5C600.86B.02.06.0007.082420181029 08/24/2018
[ 8070.232316] RIP: 0010:spl_kmem_cache_free+0xe/0x1e0 [spl]
[ 8070.232325] Code: 8b 43 70 48 8d 58 90 48 3d 90 3c 98 a0 75 e2 48 c7 c7 60 3c 98 a0 5b e9 10 f7 77 e0 0f 1f 44 00 00 41 55 41 54 55 48 89 f5 53 <48> 8b 47 28 48 89 fb 48 85 c0 74 0c 48 8b 77 30 48 89 ef e8 ea 3e
[ 8070.232326] RSP: 0018:ffff888c4389fe08 EFLAGS: 00010286
[ 8070.232329] RAX: ffffffffa0d062f0 RBX: 0000000000000003 RCX: ffffffff82942740
[ 8070.232331] RDX: ffff888cdfa71488 RSI: ffff888cdfa70d08 RDI: 0000000000000000
[ 8070.232332] RBP: ffff888cdfa70d08 R08: 0000000000000282 R09: 0000000000000000
[ 8070.232333] R10: 0000000000000015 R11: 0000000000000029 R12: 0000000000000002
[ 8070.232334] R13: 000000000000000a R14: ffff88941efabbb0 R15: 0000000000000000
[ 8070.232336] FS: 0000000000000000(0000) GS:ffff88941ef80000(0000) knlGS:0000000000000000
[ 8070.232337] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8070.232339] CR2: 0000000000000028 CR3: 0000000002c0a004 CR4: 00000000000606e0
[ 8070.232340] Call Trace:
[ 8070.232361] <TASK>
[ 8070.232362] rcu_core+0x207/0x650
[ 8070.232365] handle_softirqs+0xe7/0x270
[ 8070.232371] ? smpboot_register_percpu_thread+0xd0/0xd0
[ 8070.232376] run_ksoftirqd+0x2f/0x40
[ 8070.232379] smpboot_thread_fn+0xaf/0x140
[ 8070.232381] kthread+0x118/0x140
[ 8070.232386] ? set_kthread_struct+0x50/0x50
[ 8070.232388] ret_from_fork+0x22/0x30
[ 8070.232394] </TASK>
[ 8070.232396] Modules linked in: 8021q zfs(PO-) qat_api(O) spl(O) intel_qat(O) uio iptable_filter target_core_iblock target_core_pscsi iscsi_target_mod target_core_mod nvmet_rdma nvmet_tcp nvmet nvme_rdma nvme_tcp nvme_fabrics bonding ib_iser rdma_cm iw_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse mlx5_ib ib_umad ib_ipoib ib_cm mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel aesni_intel crypto_simd cryptd rapl intel_cstate mlx5_core mlxfw pci_hyperv_intf igb(O) ptp pps_core be2net button nls_iso8859_1 nls_cp437 ses sg ipmi_si ipmi_devintf ipmi_msghandler mpt3sas(O) nvme raid_class scsi_transport_sas nvme_core megaraid_sas(O) vfat fat aufs scsi_transport_fc [last unloaded: scst]
[ 8070.232456] CR2: 0000000000000028
Since SPL is GPL-licensed, it could technically call rcu_barrier() before destroying the Linux slab cache in spl_kmem_cache_destroy(). Would this be considered an acceptable approach, or would it be preferable to skip cache destruction when objects remain?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type: DefectIncorrect behavior (e.g. crash, hang)Incorrect behavior (e.g. crash, hang)