Background:
The /proc/diskstats information is assembled for a container by reading the blkio subsystem in the cgroup, in the proc_diskstats_read function.
Problem:
However, the blkio cgroup subsystem can be inaccurate. For instance, if a container operates on the disk /dev/sdc, the host's /proc/diskstats may show a lot of operations on /dev/sdc. But the blkio cgroup may not present this information accurately, as blkio.io_serviced_recursive, blkio.io_service_time_recursive, and other data related to these operations may be empty.
Proposed Solution:
To address this issue, one approach is to read the host's /proc/diskstats to assemble the container's /proc/diskstats.
This can be accomplished by configuring independent disks for each container, such as using /dev/sdc as the data disk of container A and /dev/sdd as the data disk of container B. This practice is also widely adopted in the industry to isolate container resources.
To support this approach, lxcfs may need to isolate /proc/partitions(I don't know why the community doesn't support it so far) and modify the diskstats assembly logic to read the corresponding disk data from the host's /proc/diskstats according to the disk used by the container and reassemble it.
A better approach may be to use an option (--enable-host-diskstats) to switch between the old and new solutions, ensuring compatibility.
Furthermore, given the limited accuracy of blkio cgroups in most cases, it may be redundant to use them to assemble /proc/diskstats.
Thank you for your attention!
Background:
The
/proc/diskstatsinformation is assembled for a container by reading the blkio subsystem in the cgroup, in theproc_diskstats_readfunction.Problem:
However, the blkio cgroup subsystem can be inaccurate. For instance, if a container operates on the disk
/dev/sdc, the host's/proc/diskstatsmay show a lot of operations on/dev/sdc. But the blkio cgroup may not present this information accurately, asblkio.io_serviced_recursive,blkio.io_service_time_recursive, and other data related to these operations may be empty.Proposed Solution:
To address this issue, one approach is to read the host's
/proc/diskstatsto assemble the container's/proc/diskstats.This can be accomplished by configuring independent disks for each container, such as using
/dev/sdcas the data disk of container A and/dev/sddas the data disk of container B. This practice is also widely adopted in the industry to isolate container resources.To support this approach, lxcfs may need to isolate
/proc/partitions(I don't know why the community doesn't support it so far) and modify the diskstats assembly logic to read the corresponding disk data from the host's/proc/diskstatsaccording to the disk used by the container and reassemble it.A better approach may be to use an option (
--enable-host-diskstats) to switch between the old and new solutions, ensuring compatibility.Furthermore, given the limited accuracy of blkio cgroups in most cases, it may be redundant to use them to assemble /proc/diskstats.
Thank you for your attention!