Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will support Cgroup v2 in the future ? #27

Open
lyon-v opened this issue Jan 13, 2025 · 1 comment
Open

Will support Cgroup v2 in the future ? #27

lyon-v opened this issue Jan 13, 2025 · 1 comment

Comments

@lyon-v
Copy link

lyon-v commented Jan 13, 2025

I test this successfully in Cgroup v1, but Cgroup v2 was wrong.

Here are the logs :

2025-01-13T03:41:56.696Z INFO GPUMounter-worker/main.go:15 Service Starting...
2025-01-13T03:41:56.696Z INFO gpu-mount/server.go:22 Creating gpu mounter
2025-01-13T03:41:56.696Z INFO allocator/allocator.go:28 Creating gpu allocator
2025-01-13T03:41:56.696Z INFO collector/collector.go:24 Creating gpu collector
2025-01-13T03:41:56.696Z INFO collector/collector.go:42 Start get gpu info
2025-01-13T03:41:56.704Z INFO collector/collector.go:53 GPU Num: 1
2025-01-13T03:41:56.710Z INFO collector/collector.go:91 Updating GPU status
2025-01-13T03:41:56.711Z INFO collector/collector.go:136 GPU status update successfully
2025-01-13T03:41:56.711Z INFO collector/collector.go:36 Successfully update gpu status
2025-01-13T03:41:56.711Z INFO allocator/allocator.go:35 Successfully created gpu collector
2025-01-13T03:41:56.711Z INFO gpu-mount/server.go:29 Successfully created gpu allocator
2025-01-13T03:41:56.711Z INFO GPUMounter-worker/main.go:22 Successfully created gpu mounter
2025-01-13T03:41:58.732Z INFO gpu-mount/server.go:35 AddGPU Service Called
2025-01-13T03:41:58.732Z INFO gpu-mount/server.go:36 request: pod_name:"owner-pod" namespace:"default" gpu_num:1
2025-01-13T03:41:58.750Z INFO gpu-mount/server.go:55 Successfully get Pod: default in cluster
2025-01-13T03:41:58.750Z INFO allocator/allocator.go:159 Get pod default/owner-pod mount type
2025-01-13T03:41:58.750Z INFO collector/collector.go:91 Updating GPU status
2025-01-13T03:41:58.750Z INFO collector/collector.go:136 GPU status update successfully
2025-01-13T03:41:58.758Z INFO allocator/allocator.go:59 Creating GPU Slave Pod: owner-pod-slave-pod-40a529 for Owner Pod: owner-pod
2025-01-13T03:41:58.758Z INFO allocator/allocator.go:239 Checking Pods: owner-pod-slave-pod-40a529 state
2025-01-13T03:41:58.760Z INFO allocator/allocator.go:265 Pod: owner-pod-slave-pod-40a529 creating
2025-01-13T03:41:58.762Z INFO allocator/allocator.go:265 Pod: owner-pod-slave-pod-40a529 creating
2025-01-13T03:41:58.763Z INFO allocator/allocator.go:265 Pod: owner-pod-slave-pod-40a529 creating
2025-01-13T03:41:58.765Z INFO allocator/allocator.go:265 Pod: owner-pod-slave-pod-40a529 creating
2025-01-13T03:42:00.142Z INFO allocator/allocator.go:278 Pods: owner-pod-slave-pod-40a529 are running
2025-01-13T03:42:00.142Z INFO allocator/allocator.go:84 Successfully create Slave Pod: owner-pod-slave-pod-40a529, for Owner Pod: owner-pod
2025-01-13T03:42:00.142Z INFO collector/collector.go:91 Updating GPU status
2025-01-13T03:42:00.143Z DEBUG collector/collector.go:130 GPU: /dev/nvidia0 allocated to Pod: owner-pod-slave-pod-40a529 in Namespace default
2025-01-13T03:42:00.143Z INFO collector/collector.go:136 GPU status update successfully
2025-01-13T03:42:00.143Z INFO gpu-mount/server.go:81 Start mounting, Total: 1 Current: 1
2025-01-13T03:42:00.143Z INFO util/util.go:19 Start mount GPU: {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-cf2f070a-ff5a-ee2b-16ac-047f7e9c16bb","State":"GPU_ALLOCATED_STATE","PodName":"owner-pod-slave-pod-40a529","Namespace":"default"} to Pod: owner-pod
2025-01-13T03:42:00.143Z INFO util/util.go:24 Pod :owner-pod container ID: a893d886e17a63b2b056cef9766df9ae2b0a1f130a2c6952ed029a0fe5b1b740
2025-01-13T03:42:00.143Z INFO util/util.go:35 Successfully get cgroup path: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4fea5b3d_b5ff_4e7f_a0f6_31a9ba061196.slice/docker-a893d886e17a63b2b056cef9766df9ae2b0a1f130a2c6952ed029a0fe5b1b740.scope for Pod: owner-pod
2025-01-13T03:42:00.145Z ERROR cgroup/cgroup.go:148 Exec "echo 'c 195:0 rw' > /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4fea5b3d_b5ff_4e7f_a0f6_31a9ba061196.slice/docker-a893d886e17a63b2b056cef9766df9ae2b0a1f130a2c6952ed029a0fe5b1b740.scope/devices.allow" failed
2025-01-13T03:42:00.145Z ERROR cgroup/cgroup.go:149 Output: sh: 1: cannot create /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4fea5b3d_b5ff_4e7f_a0f6_31a9ba061196.slice/docker-a893d886e17a63b2b056cef9766df9ae2b0a1f130a2c6952ed029a0fe5b1b740.scope/devices.allow: Directory nonexistent

2025-01-13T03:42:00.145Z ERROR cgroup/cgroup.go:150 exit status 2
2025-01-13T03:42:00.145Z ERROR util/util.go:38 Add GPU {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-cf2f070a-ff5a-ee2b-16ac-047f7e9c16bb","State":"GPU_ALLOCATED_STATE","PodName":"owner-pod-slave-pod-40a529","Namespace":"default"}failed
2025-01-13T03:42:00.145Z ERROR gpu-mount/server.go:84 Mount GPU: {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-cf2f070a-ff5a-ee2b-16ac-047f7e9c16bb","State":"GPU_ALLOCATED_STATE","PodName":"owner-pod-slave-pod-40a529","Namespace":"default"} to Pod: owner-pod in Namespace: default failed
2025-01-13T03:42:00.145Z ERROR gpu-mount/server.go:85 exit status 2


I checked this filedir, there were no /sys/fs/cgroup/devices .

@lyon-v
Copy link
Author

lyon-v commented Jan 13, 2025

And https://docs.kernel.org/admin-guide/cgroup-v2.html#device-controller , just use eBPF for device controller. But I don't how to implement this in Cgroup v2. If you know. please tell me . I'll try to fix it .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant