Verifying your Calico on OpenStack deployment

This document takes you through the steps you can perform to verify that a Calico-based OpenStack deployment is running correctly.

Prerequisites

This document requires you have the following things:

  • SSH access to the nodes in your Calico-based OpenStack deployment.
  • Access to an administrator account on your Calico-based OpenStack deployment.

Procedure

Begin by creating several instances on your OpenStack deployment using your administrator account. Confirm that these instances all launch and correctly obtain IP addresses.

You’ll want to make sure that your new instances are evenly striped across your hypervisors. On your control node, run:

nova list --fields host

Confirm that there is an even spread across your compute nodes. If there isn’t, it’s likely that an error has happened in either nova or Calico on the affected compute nodes. Check the logs on those nodes for more logging, and report your difficulty on the mailing list.

Now, SSH into one of your compute nodes. We’re going to verify that the FIB on the compute node has been correctly populated by Calico. To do that, run the route command. You’ll get output something like this:

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         net-vl401-hsrp- 0.0.0.0         UG    0      0        0 eth0
10.65.0.0       *               255.255.255.0   U     0      0        0 ns-b1163e65-42
10.65.0.103     npt06.datcon.co 255.255.255.255 UGH   0      0        0 eth0
10.65.0.104     npt09.datcon.co 255.255.255.255 UGH   0      0        0 eth0
10.65.0.105     *               255.255.255.255 UH    0      0        0 tap242f8163-08
10.65.0.106     npt09.datcon.co 255.255.255.255 UGH   0      0        0 eth0
10.65.0.107     npt07.datcon.co 255.255.255.255 UGH   0      0        0 eth0
10.65.0.108     npt08.datcon.co 255.255.255.255 UGH   0      0        0 eth0
10.65.0.109     npt07.datcon.co 255.255.255.255 UGH   0      0        0 eth0
10.65.0.110     npt06.datcon.co 255.255.255.255 UGH   0      0        0 eth0
10.65.0.111     npt08.datcon.co 255.255.255.255 UGH   0      0        0 eth0
10.65.0.112     *               255.255.255.255 UH    0      0        0 tap3b561211-dd
link-local      *               255.255.0.0     U     1000   0        0 eth0
172.18.192.0    *               255.255.255.0   U     0      0        0 eth0

You’ll expect to see one route for each of the VM IP addresses in this table. For VMs on other compute nodes, you should see that compute node’s IP address (or domain name) as the gateway. For VMs on this compute node, you should see * as the gateway, and the tap interface for that VM in the Iface field. As long as routes are present to all VMs, the FIB has been configured correctly. If any VMs are missing from the routing table, you’ll want to verify the state of the BGP connection(s) from the compute node hosting those VMs.

Having confirmed the FIB is present and correct, open the console for one of the VM instances you just created. Confirm that the machine has external connectivity by pinging google.com (or any other host you are confident is routable and that will respond to pings). Additionally, confirm it has internal connectivity by pinging the other instances you’ve created (by IP).

If all of these tests behave correctly, your Calico-based OpenStack deployment is in good shape.

Troubleshooting

If you find that none of the advice below solves your problems, please use our diagnostics gathering script to generate diagnostics, and then raise a GitHub issue against our repository. To generate the diags, run:

/usr/bin/calico-diags

VMs cannot DHCP

This can happen if your iptables is configured to have a default DROP behaviour on the INPUT or FORWARD chains. You can test this by running iptables -L -t filter and checking the output. You should see something that looks a bit like this:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED
ACCEPT     icmp --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     tcp  --  anywhere             anywhere            state NEW tcp dpt:ssh
REJECT     all  --  anywhere             anywhere            reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
REJECT     all  --  anywhere             anywhere            reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

The important sections are Chain INPUT and Chain FORWARD. Each of those needs to have a policy of ACCEPT. In some systems, this policy may be set to DENY. To change it, run iptables -P <chain> ACCEPT, replacing <chain> with either INPUT or FORWARD.

Note that doing this may be considered a security risk in some networks. A future Calico enhancement will remove the requirement to perform this step.

Routes are missing in the FIB.

If routes to some VMs aren’t present when you run route, this suggests that your BGP sessions are not functioning correctly. Your BGP daemon should have either an interactive console or a log. Open the relevant one and check that all of your BGP sessions have come up appropriately and are replicating routes. If you’re using a full mesh configuration, confirm that you have configured BGP sessions with all other Calico nodes.

VMs Cannot Ping Non-VM IPs

Assuming all the routes are present in the FIB (see above), this most commonly happens because the gateway is not configured with routes to the VM IP addresses. To get full Calico functionality the gateway should also be a BGP peer of the compute nodes (or the route reflector).

Confirm that your gateway has routes to the VMs. Assuming it does, make sure that your gateway is also advertising those routes to its external peers. It may do this using eBGP, but it may also be using some other routing protocol.

VMs Cannot Ping Other VMs

Before continuing, confirm that the two VMs are in security groups that allow inbound traffic from each other (or are both in the same security group which allows inbound traffic from itself). Traffic will not be routed between VMs that do not allow inbound traffic from each other.

Assuming that the security group configuration is correct, confirm that the machines hosting each of the VMs (potentially the same machine) have routes to both VMs. If they do not, check out the troubleshooting section above.

Web UI Shows Error Boxes Saying “Error: Unable to get quota info” and/or “Error: Unable to get volume limit”

This is likely a problem encountered with mapping devices in cinder, OpenStack’s logical volume management component. Many of these can be resolved by restarting cinder.

service cinder-volume restart
service cinder-scheduler restart 
service cinder-api restart

Cannot create instances, error log says “could not open /dev/net/tun: Operation not permitted”

This is caused by having not restarted libvirt after you add lines to the end of /etc/libvirt/qemu.conf. This can be fixed by either rebooting your entire system or running:

service libvirt-bin restart