Will support Cgroup v2 in the future ? #27

lyon-v · 2025-01-13T05:49:37Z

I test this successfully in Cgroup v1, but Cgroup v2 was wrong.

Here are the logs :

2025-01-13T03:41:56.696Z 2025-01-13T03:41:56.696Z 2025-01-13T03:41:56.696Z 2025-01-13T03:41:56.696Z 2025-01-13T03:41:56.696Z 2025-01-13T03:41:56.704Z 2025-01-13T03:41:56.710Z 2025-01-13T03:41:56.711Z 2025-01-13T03:41:56.711Z 2025-01-13T03:41:56.711Z 2025-01-13T03:41:56.711Z 2025-01-13T03:41:56.711Z 2025-01-13T03:41:58.732Z 2025-01-13T03:41:58.732Z 2025-01-13T03:41:58.750Z 2025-01-13T03:41:58.750Z 2025-01-13T03:41:58.750Z 2025-01-13T03:41:58.750Z 2025-01-13T03:41:58.758Z 2025-01-13T03:41:58.758Z 2025-01-13T03:41:58.760Z 2025-01-13T03:41:58.762Z 2025-01-13T03:41:58.763Z 2025-01-13T03:41:58.765Z 2025-01-13T03:42:00.142Z 2025-01-13T03:42:00.142Z 2025-01-13T03:42:00.142Z 2025-01-13T03:42:00.143Z 2025-01-13T03:42:00.143Z 2025-01-13T03:42:00.143Z 2025-01-13T03:42:00.143Z 2025-01-13T03:42:00.143Z 2025-01-13T03:42:00.143Z 2025-01-13T03:42:00.145Z 2025-01-13T03:42:00.145Z INFO GPUMounter-worker/main.go:15 Service Starting...
INFO gpu-mount/server.go:22 Creating gpu mounter
INFO allocator/allocator.go:28 Creating gpu allocator
INFO collector/collector.go:24 Creating gpu collector
INFO collector/collector.go:42 Start get gpu info
INFO collector/collector.go:53 GPU Num: 1
INFO collector/collector.go:91 Updating GPU status
INFO collector/collector.go:136 GPU status update successfully
INFO collector/collector.go:36 Successfully update gpu status
INFO allocator/allocator.go:35 Successfully created gpu collector
INFO gpu-mount/server.go:29 Successfully created gpu allocator
INFO GPUMounter-worker/main.go:22 Successfully created gpu mounter
INFO gpu-mount/server.go:35 AddGPU Service Called
INFO gpu-mount/server.go:36 request: pod_name:"owner-pod" namespace:"default" gpu_num:1
INFO gpu-mount/server.go:55 Successfully get Pod: default in cluster
INFO allocator/allocator.go:159 Get pod default/owner-pod mount type
INFO collector/collector.go:91 Updating GPU status
INFO collector/collector.go:136 GPU status update successfully
INFO allocator/allocator.go:59 Creating GPU Slave Pod: owner-pod-slave-pod-40a529 for Owner Pod: owner-pod
INFO allocator/allocator.go:239 Checking Pods: owner-pod-slave-pod-40a529 state
INFO allocator/allocator.go:265 Pod: owner-pod-slave-pod-40a529 creating
INFO allocator/allocator.go:265 Pod: owner-pod-slave-pod-40a529 creating
INFO allocator/allocator.go:265 Pod: owner-pod-slave-pod-40a529 creating
INFO allocator/allocator.go:265 Pod: owner-pod-slave-pod-40a529 creating
INFO allocator/allocator.go:278 Pods: owner-pod-slave-pod-40a529 are running
INFO allocator/allocator.go:84 Successfully create Slave Pod: owner-pod-slave-pod-40a529, for Owner Pod: owner-pod
INFO collector/collector.go:91 Updating GPU status
DEBUG collector/collector.go:130 GPU: /dev/nvidia0 allocated to Pod: owner-pod-slave-pod-40a529 in Namespace default
INFO collector/collector.go:136 GPU status update successfully
INFO gpu-mount/server.go:81 Start mounting, Total: 1 Current: 1
INFO util/util.go:19 Start mount GPU: {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-cf2f070a-ff5a-ee2b-16ac-047f7e9c16bb","State":"GPU_ALLOCATED_STATE","PodName":"owner-pod-slave-pod-40a529","Namespace":"default"} to Pod: owner-pod
INFO util/util.go:24 Pod :owner-pod container ID: a893d886e17a63b2b056cef9766df9ae2b0a1f130a2c6952ed029a0fe5b1b740
INFO util/util.go:35 Successfully get cgroup path: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4fea5b3d_b5ff_4e7f_a0f6_31a9ba061196.slice/docker-a893d886e17a63b2b056cef9766df9ae2b0a1f130a2c6952ed029a0fe5b1b740.scope for Pod: owner-pod
ERROR cgroup/cgroup.go:148 Exec "echo 'c 195:0 rw' > /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4fea5b3d_b5ff_4e7f_a0f6_31a9ba061196.slice/docker-a893d886e17a63b2b056cef9766df9ae2b0a1f130a2c6952ed029a0fe5b1b740.scope/devices.allow" failed
ERROR cgroup/cgroup.go:149 Output: sh: 1: cannot create /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4fea5b3d_b5ff_4e7f_a0f6_31a9ba061196.slice/docker-a893d886e17a63b2b056cef9766df9ae2b0a1f130a2c6952ed029a0fe5b1b740.scope/devices.allow: Directory nonexistent

2025-01-13T03:42:00.145Z ERROR cgroup/cgroup.go:150 exit status 2
2025-01-13T03:42:00.145Z ERROR util/util.go:38 Add GPU {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-cf2f070a-ff5a-ee2b-16ac-047f7e9c16bb","State":"GPU_ALLOCATED_STATE","PodName":"owner-pod-slave-pod-40a529","Namespace":"default"}failed
2025-01-13T03:42:00.145Z ERROR gpu-mount/server.go:84 Mount GPU: {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-cf2f070a-ff5a-ee2b-16ac-047f7e9c16bb","State":"GPU_ALLOCATED_STATE","PodName":"owner-pod-slave-pod-40a529","Namespace":"default"} to Pod: owner-pod in Namespace: default failed
2025-01-13T03:42:00.145Z ERROR gpu-mount/server.go:85 exit status 2

I checked this filedir, there were no /sys/fs/cgroup/devices .

lyon-v · 2025-01-13T07:06:09Z

And https://docs.kernel.org/admin-guide/cgroup-v2.html#device-controller ， just use eBPF for device controller. But I don't how to implement this in Cgroup v2. If you know. please tell me . I'll try to fix it .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will support Cgroup v2 in the future ? #27

Will support Cgroup v2 in the future ? #27

lyon-v commented Jan 13, 2025

lyon-v commented Jan 13, 2025

Will support Cgroup v2 in the future ? #27

Will support Cgroup v2 in the future ? #27

Comments

lyon-v commented Jan 13, 2025

Here are the logs :

lyon-v commented Jan 13, 2025