Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to replace all privileged components? #229

Open
Lanius-collaris opened this issue May 8, 2024 · 3 comments
Open

Is it possible to replace all privileged components? #229

Lanius-collaris opened this issue May 8, 2024 · 3 comments

Comments

@Lanius-collaris
Copy link

Lanius-collaris commented May 8, 2024

Which components require CAP_NET_RAW? mtr and traceroute? Is it possible to replace them?

Q & A about TCP traceroute

1. How to send a specific number of SYN?

Set TCP_SYNCNT socket option.

2. How to reuse a TCP socket?

TCP sockets can dissolve the association by connecting to an address with the sa_family member of sockaddr set to AF_UNSPEC.

3. How to know which packet triggered a ICMP packet?

Use a TCP option you can control, for example, maximum segment size (using TCP_MAXSEG socket option).

PoC

package main

import (
    "bytes"
    "encoding/binary"
    "fmt"
    "net/netip"
    "os"
    "syscall"
    "time"
)

const SO_EE_ORIGIN_ICMP6 = 3

func main() {
    //fastly.jsdelivr.net
    dst := syscall.SockaddrInet6{
        Port:   443,
        ZoneId: 0,
        Addr: [16]byte{
            0x2a, 0x04, 0x4e, 0x42,
            0, 0, 0, 0,
            0, 0, 0, 0,
            0, 0, 0x04, 0x85,
        },
    }
    sock, err := syscall.Socket(syscall.AF_INET6, syscall.SOCK_STREAM, syscall.IPPROTO_TCP)
    if err != nil {
        panic(err)
    }
    defer syscall.Close(sock)
    err = syscall.SetNonblock(sock, true)
    if err != nil {
        panic(err)
    }

    err = syscall.SetsockoptInt(sock, syscall.SOL_SOCKET, syscall.SO_TIMESTAMP, 1)
    if err != nil {
        panic(err)
    }
    err = syscall.SetsockoptInt(sock, syscall.IPPROTO_IPV6, syscall.IPV6_RECVHOPLIMIT, 1)
    if err != nil {
        panic(err)
    }
    err = syscall.SetsockoptInt(sock, syscall.IPPROTO_IPV6, syscall.IPV6_UNICAST_HOPS, 1)
    if err != nil {
        panic(err)
    }
    err = syscall.SetsockoptInt(sock, syscall.IPPROTO_IPV6, syscall.IPV6_RECVERR, 1)
    if err != nil {
        panic(err)
    }
    err = syscall.SetsockoptInt(sock, syscall.SOL_TCP, syscall.TCP_SYNCNT, 1)
    if err != nil {
        panic(err)
    }
    err = syscall.SetsockoptInt(sock, syscall.SOL_TCP, syscall.TCP_MAXSEG, 501)
    if err != nil {
        panic(err)
    }

    start := time.Now()
    err = syscall.Connect(sock, &dst)
    if err != nil {
        fmt.Printf("connecting... %v\n", err)
    }
    timer1 := time.NewTimer(5 * time.Millisecond)
    _ = <-timer1.C
    var buf [2048]byte
    var cmsgBuf [1024]byte
    n, cmsgN, flags, from, err := syscall.Recvmsg(sock, buf[:], cmsgBuf[:], syscall.MSG_ERRQUEUE)
    if err != nil {
        fmt.Printf("recvmsg error: %v\n", err)
    }
    fmt.Printf("n: %d\ncmsgN: %d\nflags: 0x%x\nfrom: %v\nmsg: %v\n\n", n, cmsgN, flags, from, buf[:n])
    fmt.Printf("start time\nreadable: %v\nUnixMicro: %d\n\n", start, start.UnixMicro())
    cmsgArr, err := syscall.ParseSocketControlMessage(cmsgBuf[:cmsgN])
    if err != nil {
        panic(err)
    }
    fmt.Printf("cmsgArr: %v\n\n", cmsgArr)
    for _, cmsg := range cmsgArr {
        if cmsg.Header.Level == syscall.SOL_SOCKET && cmsg.Header.Type == syscall.SO_TIMESTAMP {
            if len(cmsg.Data) >= 16 {
                reader := bytes.NewReader(cmsg.Data[:16])
                var sec int64
                var usec int64
                binary.Read(reader, binary.NativeEndian, &sec)
                binary.Read(reader, binary.NativeEndian, &usec)
                fmt.Printf("sec: %d\nusec: %d\n", sec, usec)
                fmt.Printf("cmsg timestamp: %v\n\n", time.Unix(sec, usec*1000))
            }
        }
        if cmsg.Header.Level == syscall.IPPROTO_IPV6 && cmsg.Header.Type == syscall.IPV6_HOPLIMIT {
            if len(cmsg.Data) >= 4 {
                reader := bytes.NewReader(cmsg.Data[:4])
                var hopLimit uint32
                binary.Read(reader, binary.NativeEndian, &hopLimit)
                fmt.Printf("cmsg hop limit: %d\n\n", hopLimit)
            }
        }
        if cmsg.Header.Level == syscall.IPPROTO_IPV6 && cmsg.Header.Type == syscall.IPV6_RECVERR {
            if len(cmsg.Data) >= 16 {
                eeOrigin := cmsg.Data[4]
                eeType := cmsg.Data[5]
                eeCode := cmsg.Data[6]
                fmt.Printf("eeOrigin: %d\neeType: %d\neeCode: %d\n\n", eeOrigin, eeType, eeCode)
                if eeOrigin == SO_EE_ORIGIN_ICMP6 && eeType == 3 && eeCode == 0 {
                    if len(cmsg.Data) >= 44 {
                        addr, _ := netip.AddrFromSlice(cmsg.Data[24:40])
                        fmt.Printf("ICMPv6 Time Exceeded Message from: %v\n", addr)
                    }
                }
            }
        }
    }
    os.WriteFile("cmsg.bin", cmsgBuf[:cmsgN], 0o600)
}
@jimaek
Copy link
Member

jimaek commented May 8, 2024

Hey, CAP_NET_RAW is required for ping tests due to ICMP. We haven't tested a way to bypass that in code because we're currently using the Linux native ping tool. An alternative seems to be sysctl options https://www.antitree.com/2019/01/containers-using-ping-without-cap_net_raw/

For now we don't really plan to write our own binaries to run the tests so the default requirement will have to stay, at least until we're ready to focus more on how the tests run.

Power users can of course try and bypass that with sysctl or even running different docker runtimes

@Lanius-collaris
Copy link
Author

Using net.* sysctls is not allowed when I use --network=host, but ping still works. So I think it's due to mtr or traceroute.

@jimaek
Copy link
Member

jimaek commented May 8, 2024

For now unfortunately I can't offer a solution. The cap is a requirement for podman and probably rootless Docker for our tests to work. Our original tests showed that without it there are too many problems.

When we start working on the binaries we will consider this task :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants