Skip to content

Error while running under kubernetes #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ramukima opened this issue Nov 30, 2016 · 7 comments
Closed

Error while running under kubernetes #21

ramukima opened this issue Nov 30, 2016 · 7 comments

Comments

@ramukima
Copy link

RTNETLINK answers: File exists
Cannot find device "macvtapa85bcf"
Cannot find device "macvtapa85bcf"
cat: /sys/devices/virtual/net/macvtapa85bcf/tap*/dev: No such file or directory
mknod: missing operand after 'c'
Special files require major and minor device numbers.
Try 'mknod --help' for more information.
INFO: DHCP configured to serve IP 192.168.75.19/32 via macvtapa85bcf (attached to eth0)
INFO: Lauching dnsmasq ......
INFO: Launching /usr/libexec/qemu-kvm .......
2016-11-30T17:38:19.804224Z qemu-kvm: -netdev tap,id=net0,vhost=on,fd=3: TUNGETIFF ioctl() failed: Inappropriate ioctl for device
TUNSETOFFLOAD ioctl() failed: Inappropriate ioctl for device

@ramukima
Copy link
Author

ramukima commented Nov 30, 2016

Looks like a timing issue. Should the startvm script wait for container interface to come up and then program it ? Or is it due to #20 ?

@ramukima
Copy link
Author

ramukima commented Dec 1, 2016

I see it happening almost all the time. Even after I created a docker image with the qcow2 image embedded. I do not see this issue when I run directly with docker using debug mode "bash" and then running the 'startvm' script manually. @methadata Any guidance is much appreciated.

This is where it fails with the RTNETLINK error. Could it be a conflicting default gateway ?

# macvtap device creation for the VM
    ip link add link $iface name $vtapdev type macvtap mode bridge

methadata pushed a commit that referenced this issue Dec 12, 2016
@methadata
Copy link
Collaborator

Sorry for the delay in the reply. Unfortunately, I'm not able to reproduce this error, even with Kubernetes.

The error described above seems to be more related with the creation of the macvtap device, as it has appeared in #12 . Long story short: the name of the new macvtap device clashed if there was another kvm container running on the Docker Host. That's why I randomised the device name.

In this particular case, I do not know why the macvtap device creation fails, and why it's working if the startvm script is launched manually.

To continue investigating further this issue, I have created a new branch in this repo, so there's a new tag in docker hub called bugfix_21. Please try to run the container again using this new image: bbvainnotech/kvm:bugfix_21 and reply to this issue with the debug output. It should halt after getting first error.

@ramukima
Copy link
Author

ramukima commented Dec 12, 2016

Thanks for your reply. Here is the latest test result with the debug image:

First Run

~# kubectl logs ams-gw-932307302-yadew -n bmw
DEBUG: New device names generated: macvtap930997 macvlan930997
DEBUG: vlan/vtap devices: *NONE*
+ ip link add link eth0 name macvtap930997 type macvtap mode bridge
+ ip link set macvtap930997 address 12:47:0b:a8:25:ae
+ ip link set macvtap930997 up
+ ip link add link eth0 name macvlan930997 type macvlan mode bridge
+ ip link set macvlan930997 up
+ set +xe
DEBUG: vlan/vtap devices: macvtap930997 macvlan930997
INFO: DHCP configured to serve IP 192.168.75.66/32 via macvtap930997 (attached to eth0)
INFO: Lauching dnsmasq                                --dhcp-range=192.168.75.66,192.168.75.66                                    --dhcp-host=12:47:0b:a8:25:ae,,192.168.75.66,ams-gw-932307302-yadew,infinite           --dhcp-option=option:netmask,255.255.255.255                                      --dhcp-option=option:dns-server,10.96.0.10          --dhcp-option=option:router,169.254.1.1            --dhcp-option=option:domain-search,bmw.svc.cluster.local,svc.cluster.local,cluster.local     --dhcp-option=option:domain-name,bmw.svc.cluster.local
INFO: Launching /usr/libexec/qemu-kvm -enable-kvm -drive if=virtio,file=/image/image   -machine accel=kvm,usb=off   -nodefaults   -device virtio-balloon-pci,id=balloon0   -realtime mlock=off   -msg timestamp=on   -chardev pty,id=charserial0   -device isa-serial,chardev=charserial0,id=serial0   -serial stdio    -vga qxl -display none -m 1024 -smp 4,sockets=4,cores=1,threads=1    -device virtio-net-pci,netdev=net0,mac=12:47:0b:a8:25:ae -netdev tap,id=net0,vhost=on,fd=3 3<>/dev/macvtap930997

All good here, VM comes up perfectly fine. I deleted the above deployment and waited for a minute to deploy it again.

Second Run

~#kubectl logs ams-gw-932307302-0jufi -n bmw
DEBUG: New device names generated: macvtap55c7bc macvlan55c7bc
DEBUG: vlan/vtap devices: *NONE*
+ ip link add link eth0 name macvtap55c7bc type macvtap mode bridge
RTNETLINK answers: File exists
~#

If informational, the IP addresses to the containers are handed by Flannel in my environment. With the debug image, the deployment goes for a CrashLoopBackOff (which is what you expected I guess).

Note that, I could notice the size of my qcow2 image change after the first run. Next few runs throw the same error as 'Second Run'. However, if I wait again for a while after deleting the deployment from the second run (after confirming that the deployment/pod/containers related were all deleted), I see it produce similar logs as 'First Run'. e.g.

~# kubectl logs ams-gw-932307302-d1cw9 -n bmw
DEBUG: New device names generated: macvtap57a8e6 macvlan57a8e6
DEBUG: vlan/vtap devices: *NONE*
+ ip link add link eth0 name macvtap57a8e6 type macvtap mode bridge
+ ip link set macvtap57a8e6 address ea:e7:53:58:f4:17
+ ip link set macvtap57a8e6 up
+ ip link add link eth0 name macvlan57a8e6 type macvlan mode bridge
+ ip link set macvlan57a8e6 up
+ set +xe
DEBUG: vlan/vtap devices: macvtap57a8e6 macvlan57a8e6
INFO: DHCP configured to serve IP 192.168.75.72/32 via macvtap57a8e6 (attached to eth0)
INFO: Lauching dnsmasq                                --dhcp-range=192.168.75.72,192.168.75.72                                    --dhcp-host=ea:e7:53:58:f4:17,,192.168.75.72,ams-gw-932307302-d1cw9,infinite           --dhcp-option=option:netmask,255.255.255.255                                      --dhcp-option=option:dns-server,10.96.0.10          --dhcp-option=option:router,169.254.1.1            --dhcp-option=option:domain-search,bmw.svc.cluster.local,svc.cluster.local,cluster.local     --dhcp-option=option:domain-name,bmw.svc.cluster.local
INFO: Launching /usr/libexec/qemu-kvm -enable-kvm -drive if=virtio,file=/image/image   -machine accel=kvm,usb=off   -nodefaults   -device virtio-balloon-pci,id=balloon0   -realtime mlock=off   -msg timestamp=on   -chardev pty,id=charserial0   -device isa-serial,chardev=charserial0,id=serial0   -serial stdio    -vga qxl -display none -m 1024 -smp 4,sockets=4,cores=1,threads=1    -device virtio-net-pci,netdev=net0,mac=ea:e7:53:58:f4:17 -netdev tap,id=net0,vhost=on,fd=3 3<>/dev/macvtap57a8e6
~#

@ramukima
Copy link
Author

ramukima commented Dec 12, 2016

Also note that, when I kill the container spawned on behalf of my deployment/pod as part of the First Run, kubernetes attempts to re-spawn the container (as per the deployment policy), and the logs starts to show the following:

DEBUG: New device names generated: macvtapa349a5 macvlana349a5
DEBUG: vlan/vtap devices: *NONE*
+ ip link add link eth0 name macvtapa349a5 type macvtap mode bridge
RTNETLINK answers: File exists

methadata pushed a commit that referenced this issue Dec 13, 2016
@methadata
Copy link
Collaborator

methadata commented Dec 13, 2016

As you pointed out, the issue seems to be related with the respawn of the container that k8s performs. In this context, the ethX device present in the container has something already attached (the old macvtap?) that prevents the creation of a new macvtap device attached to it.

I have commited a new release (7c64a96) of the bbvainnotech/kvm:bugfix_21 image that tries to execute indefinitely the ip link command that fails, just to be sure if the problem continues some time after the pod is launched. @ramukima please, try again with this new release, and post the output to see if it's a timing issue.

Meanwhile I will try to reproduce the issue myself with kubernetes.

@ramukima
Copy link
Author

These changes work great, confirmed that the container re-spawn works good. Here is the log:

DEBUG: New device names generated: macvtapfd059a macvlanfd059a
DEBUG: vlan/vtap devices: *NONE*
+ local i=1
++ ip link add link eth0 name macvtapfd059a type macvtap mode bridge
RTNETLINK answers: File exists
+ echo -n .
+ let i++
+ sleep 1
++ ip link add link eth0 name macvtapfd059a type macvtap mode bridge
RTNETLINK answers: File exists
+ echo -n .
+ let i++
+ sleep 1
++ ip link add link eth0 name macvtapfd059a type macvtap mode bridge
..DEBUG: device creation succeed after 3 attemps
+ echo 'DEBUG: device creation succeed after 3 attemps'
+ ip link set macvtapfd059a address 62:29:59:f6:8c:c2
+ ip link set macvtapfd059a up
+ ip link add link eth0 name macvlanfd059a type macvlan mode bridge
+ ip link set macvlanfd059a up
+ set +xe
DEBUG: vlan/vtap devices: macvtapfd059a macvlanfd059a
INFO: DHCP configured to serve IP 192.168.75.100/32 via macvtapfd059a (attached to eth0)
INFO: Lauching dnsmasq                                --dhcp-range=192.168.75.100,192.168.75.100                                    --dhcp-host=62:29:59:f6:8c:c2,,192.168.75.100,ams-gw-4040377064-mest1,infinite           --dhcp-option=option:netmask,255.255.255.255                                      --dhcp-option=option:dns-server,10.96.0.10          --dhcp-option=option:router,169.254.1.1            --dhcp-option=option:domain-search,bmw.svc.cluster.local,svc.cluster.local,cluster.local     --dhcp-option=option:domain-name,bmw.svc.cluster.local
INFO: Launching /usr/libexec/qemu-kvm -enable-kvm -drive if=virtio,file=/image/image   -machine accel=kvm,usb=off   -nodefaults   -device virtio-balloon-pci,id=balloon0   -realtime mlock=off   -msg timestamp=on   -chardev pty,id=charserial0   -device isa-serial,chardev=charserial0,id=serial0   -serial stdio    -vga qxl -display none -m 1024 -smp 4,sockets=4,cores=1,threads=1    -device virtio-net-pci,netdev=net3,mac=62:29:59:f6:8c:c2 -netdev tap,id=net3,vhost=on,fd=6 6<>/dev/macvtapfd059a

Thank you for fixing this. Can this be merged to latest and I can confirm on latest build as well ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants