Skip to content

lambda checkpoint not working from docker macOS #24

@tzvetkovg

Description

@tzvetkovg

I am following the lambda git example https://github.com/CRaC/example-lambda to create a lambda checkpoint and follow the steps exactly via a docker container (ubuntu 20.04). I run it from a macOS (M1 arm chip). However, it doesn't work. My steps are:

  1. Via a docker sock I run docker run --privileged --platform=linux/amd64 --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v $(pwd):/$(pwd) -w $(pwd) teracy/ubuntu:20.04-dind-20.10.13 bash
  2. Once I am in the container I run ./crac-steps.sh s00_init
  3. I download the crack JDK in the container as follows

CRAC_VERSION=17-crac+6
curl -LO https://github.com/CRaC/openjdk-builds/releases/download/$CRAC_VERSION/openjdk-"$CRAC_VERSION"_linux-x64.tar.gz
tar axf openjdk-"$CRAC_VERSION"_linux-x64.tar.gz

  1. and then do ./crac-steps.sh dojlink openjdk-17-crac+6_linux-x64 which extracts the jdk folder fine
  2. ./crack-steps s01_build (works fine)
  3. start the container via ./crac-steps.sh s02_start_checkpoint (works fine)
  4. But when I do the checkpoint stuff via ./crac-steps.sh s03_checkpoint I get

dump.log

the command root@2ee49f701218:/tmp/sub/jdk/lib# ./criu check --all produced

Warn (criu/kerndat.c:1349): Can't get pidfd
Warn (criu/kerndat.c:1466): CRIU was built without libnftables support
Error (criu/util.c:705): read: Success
Warn (criu/cr-check.c:813): Dirty tracking is OFF. Memory snapshot will not work.
Warn (criu/cr-check.c:1242): Do not have API to map vDSO - will use mremap() to restore vDSO
Error (criu/cr-check.c:1208): UFFD is not supported
Error (criu/cr-check.c:1208): UFFD is not supported
Warn (criu/cr-check.c:1231): clone3() with set_tid not supported
Error (criu/cr-check.c:1273): Time namespaces are not supported
Warn (criu/cr-check.c:1300): Pidfd store requires pidfd_open syscall which is not supported
Warn (criu/cr-check.c:1334): Nftables based locking requires libnftables and set concatenations support
Error (criu/cr-check.c:996): failed to mount autofs: No such device
Warn (criu/cr-check.c:1160): compat_cr is not supported. Requires kernel >= v4.12
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.

it looks like it attempts to create the checkpoint but then I get right at the end (check in the log)

(00.247778) Parasite syscall_ip at 0x555555554000
(00.248048) Error (compel/arch/x86/src/lib/infect.c:518): Can't get CS register for 135: Input/output error
(00.248179) Error (compel/arch/x86/src/lib/infect.c:551): Can't dump task 135 with LDT descriptors
(00.248375) Error (criu/cr-dump.c:1566): Can't infect (pid: 135) with parasite
(00.249940) Unlock network
(00.250489) Unfreezing tasks into 1
(00.250597) Unseizing 135 into 1
(00.251079) Error (criu/cr-dump.c:2063): Dumping FAILED.

In my lambda container the exception log says

INFO: /function/lib/netty-nio-client-2.10.72.jar is recorded as always available on restore
CR: Checkpoint ...
CRIU failed with exit code 1 - check /cr/dump4.log
Command: /tmp/sub/jdk/lib/criu dump -t 135 -D /cr --shell-job -v4 -o dump4.log
JVM: invalid info for restore provided: queued code -1
END RequestId: df693adf-960e-4912-a1de-244258825b98
REPORT RequestId: df693adf-960e-4912-a1de-244258825b98 Duration: 696.74 ms Billed Duration: 697 ms Memory Size: 3008 MB Max Memory Used: 3008 MB
org.crac.CheckpointException
at org.crac.Core$Compat.checkpointRestore(Core.java:141)
at org.crac.Core.checkpointRestore(Core.java:219)
at example.Handler.lambda$handleRequest$0(Handler.java:36)
at java.base/java.lang.Thread.run(Thread.java:833)
Suppressed: java.lang.RuntimeException: Native checkpoint failed.
at java.base/jdk.crac.Core.translateJVMExceptions(Core.java:114)
at java.base/jdk.crac.Core.checkpointRestore1(Core.java:192)
at java.base/jdk.crac.Core.checkpointRestore(Core.java:299)
at java.base/jdk.crac.Core.checkpointRestore(Core.java:278)
at java.base/javax.crac.Core.checkpointRestore(Core.java:73)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.crac.Core$Compat.checkpointRestore(Core.java:138)
... 3 more

Any ideas what's wrong? I can see in the start of the log File /run/criu.kdat does not exist which kind of suggests the criu libs aren't on the classpath but they are as I can see the jdk folder extracted? Is the docker sock the issue or the fact that I am on macOS rather than linux?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions