-
Notifications
You must be signed in to change notification settings - Fork 31
Description
We're trying to run Assise with leases enabled. We set the -DUSE_LEASE flags in the KernFS and LibFS Makefiles respectively, and set up our file system using emulated NVM as shown in the readme.
Unfortunately, it seems that there is a segmentation fault occuring in iotest when LibFS acquires a lease. This only occurs when leases are active: when they're disabled, the program runs normally. Here is the output of the KernFS and LibFS on the iotest from the readme:
KernFS:
sudo ./run.sh kernfs
initialize file system
dev-dax engine is initialized: dev_path /dev/dax0.0 size 4096 MB
Reading root inode with inum: 1fetching node's IP address..
Process pid is 25584
ip address on interface 'lo' is 127.0.0.1
cluster settings:
--- node 0 - ip:127.0.0.1
MLFS cluster initialized
[Local-Server] Listening on port 12345 for connections. interrupt (^C) to exit.
Adding connection with sockfd: 0
Adding connection with sockfd: 1
Adding connection with sockfd: 2
RECV <-- MSG_INIT [pid 0]
RECV <-- MSG_INIT [pid 1]
RECV <-- MSG_INIT [pid 2]
[add_peer_socket():80] Peer connected (ip: 127.0.0.1, pid: 25597)
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:1 of type:1 and peer:0x7f8baa40f000
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:0 of type:0 and peer:0x7f8baa40f000
SEND --> MSG_SHM [paths: /shm_recv_0|/shm_send_0]
start shmem_poll_loop for sockfd 0
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:2 of type:2 and peer:0x7f8baa40f000
SEND --> MSG_SHM [paths: /shm_recv_2|/shm_send_2]
start shmem_poll_loop for sockfd 2
SEND --> MSG_SHM [paths: /shm_recv_1|/shm_send_1]
start shmem_poll_loop for sockfd 1
[discard_leases():933] discarding all leases for peer ID = 1
Connection terminated [sockfd:0]
Connection terminated [sockfd:1]
Exit server_thread
Exit server_thread
./run.sh: line 15: 25584 Segmentation fault LD_LIBRARY_PATH=../build:../../libfs/lib/nvml/src/nondebug/ LD_PRELOAD=../../libfs/lib/jemalloc-4.5.0/lib/libjemalloc.so.2 MLFS_PROFILE=1 numactl -N0 -m0 $@
LibFS
sudo ./run.sh iotest sw 2G 4K 1
[tid:25597][device_init():50] dev id 1
dev-dax engine is initialized: dev_path /dev/dax0.0 size 4096 MB
[tid:25597][device_init():50] dev id 2
[tid:25597][device_init():50] dev id 3
[tid:25597][cache_init():293] allocating 262144 blocks for DRAM read cache
[tid:25597][read_superblock():504] [dev 1] superblock: size 883712 nblocks 856598 ninodes 300000 inodestart 2 bmap start 27087 datablock_start 27114
fetching node's IP address..
Process pid is 25597
ip address on interface 'lo' is 127.0.0.1
cluster settings:
[tid:25597][register_peer_log():245] assigning peer (ip: 127.0.0.1 pid: 0) to log id 0
--- node 0 - ip:127.0.0.1
Connecting to KernFS instance 0 [ip: 127.0.0.1]
[Local-Client] Creating connection (pid:25597, app_type:0, status:pending) to 127.0.0.1:12345 on sockfd 0
In thread
[Local-Client] Creating connection (pid:25597, app_type:1, status:pending) to 127.0.0.1:12345 on sockfd 1
In thread
[Local-Client] Creating connection (pid:25597, app_type:2, status:pending) to 127.0.0.1:12345 on sockfd 2
[tid:25597][init_rpc():148] awaiting remote KernFS connections
In thread
SEND --> MSG_INIT [pid 0|25597]
SEND --> MSG_INIT [pid 2|25597]
SEND --> MSG_INIT [pid 1|25597]
RECV <-- MSG_SHM [paths: /shm_recv_1|/shm_send_1]
RECV <-- MSG_SHM [paths: /shm_recv_0|/shm_send_0]
[tid:25599][add_peer_socket():63] found socket 1
[tid:25599][_find_peer():176] trying to find peer with ip 127.0.0.1 and pid 0 (peer count: 1 | sock count: 0)
[tid:25599][_find_peer():206] peer[0]: ip 127.0.0.1 pid 0
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:1 of type:1 and peer:0x565200a47d30
start shmem_poll_loop for sockfd 1
RECV <-- MSG_SHM [paths: /shm_recv_2|/shm_send_2]
[tid:25598][add_peer_socket():63] found socket 0
[tid:25600][add_peer_socket():63] found socket 2
[tid:25598][_find_peer():176] trying to find peer with ip 127.0.0.1 and pid 0 (peer count: 1 | sock count: 1)
[tid:25598][_find_peer():191] sockfd[0]: ip 127.0.0.1 pid 0
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:0 of type:0 and peer:0x565200a47d30
start shmem_poll_loop for sockfd 0
[tid:25600][_find_peer():176] trying to find peer with ip 127.0.0.1 and pid 0 (peer count: 1 | sock count: 2)
[tid:25600][_find_peer():191] sockfd[0]: ip 127.0.0.1 pid 0
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:2 of type:2 and peer:0x565200a47d30
start shmem_poll_loop for sockfd 2
[tid:25597][rpc_bootstrap():909] peer send: |bootstrap |25597
[tid:25598][signal_callback():1357] received rpc with body: |bootstrap |1 on sockfd 0
[signal_callback():1370] Assigned LibFS ID=1
MLFS cluster initialized
[tid:25597][init_log():148] end of the log e7c00
init log dev 1 start_blk 916481 end 949248
[tid:25597][ialloc():647] get inode - inum 1
[tid:25597][init_fs():459] LibFS is initialized on dev 1
Total file size: 2147483648B
io size: 4096B
# of thread: 1
[tid:25597][mlfs_posix_mkdir():419] [POSIX] mkdir(/mlfs/)
[tid:25597][namex():274] namex: path /mlfs/, parent 1, name mlfs
[acquire_lease():430] LIBFS ID= 1 trying to acquire lease of type 2 for inum 1
./run.sh: line 5: 25597 Segmentation fault LD_PRELOAD=../build/libmlfs.so MLFS_PROFILE=1 ${@}
My best guess is that LibFS is having a segfault when it tries to access the lease cache in acquire_lease(). The function calls lcache_find() and looks inside a hash table to find the lease. I have a feeling there is something incorrect about our setup (strata_access@kitten3-lom on the UT cluster), but it might not be since the code works fine with leases off. Do you know where we could start to debug this? Thank you!