Verifying your Calico on OpenStack deployment
This document takes you through the steps you can perform to verify that a Calico-based OpenStack deployment is running correctly.
This document requires you have the following things:
- SSH access to the nodes in your Calico-based OpenStack deployment.
- Access to an administrator account on your Calico-based OpenStack deployment.
Begin by creating several instances on your OpenStack deployment using your administrator account. Confirm that these instances all launch and correctly obtain IP addresses.
You’ll want to make sure that your new instances are evenly striped across your hypervisors. On your control node, run:
nova list --fields host
Confirm that there is an even spread across your compute nodes. If there isn’t, it’s likely that an error has happened in either nova or Calico on the affected compute nodes. Check the logs on those nodes for more logging, and report your difficulty on the mailing list.
Now, SSH into one of your compute nodes. We’re going to verify that the
FIB on the compute node has been correctly populated by Calico. To do
that, run the
route command. You’ll get output something like this:
Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default net-vl401-hsrp- 0.0.0.0 UG 0 0 0 eth0 10.65.0.0 * 255.255.255.0 U 0 0 0 ns-b1163e65-42 10.65.0.103 npt06.datcon.co 255.255.255.255 UGH 0 0 0 eth0 10.65.0.104 npt09.datcon.co 255.255.255.255 UGH 0 0 0 eth0 10.65.0.105 * 255.255.255.255 UH 0 0 0 tap242f8163-08 10.65.0.106 npt09.datcon.co 255.255.255.255 UGH 0 0 0 eth0 10.65.0.107 npt07.datcon.co 255.255.255.255 UGH 0 0 0 eth0 10.65.0.108 npt08.datcon.co 255.255.255.255 UGH 0 0 0 eth0 10.65.0.109 npt07.datcon.co 255.255.255.255 UGH 0 0 0 eth0 10.65.0.110 npt06.datcon.co 255.255.255.255 UGH 0 0 0 eth0 10.65.0.111 npt08.datcon.co 255.255.255.255 UGH 0 0 0 eth0 10.65.0.112 * 255.255.255.255 UH 0 0 0 tap3b561211-dd link-local * 255.255.0.0 U 1000 0 0 eth0 172.18.192.0 * 255.255.255.0 U 0 0 0 eth0
You’ll expect to see one route for each of the VM IP addresses in this
table. For VMs on other compute nodes, you should see that compute
node’s IP address (or domain name) as the
gateway. For VMs on this
compute node, you should see
* as the
gateway, and the tap interface
for that VM in the
Iface field. As long as routes are present to all
VMs, the FIB has been configured correctly. If any VMs are missing from
the routing table, you’ll want to verify the state of the BGP
connection(s) from the compute node hosting those VMs.
Having confirmed the FIB is present and correct, open the console for
one of the VM instances you just created. Confirm that the machine has
external connectivity by pinging
google.com (or any other host you are
confident is routable and that will respond to pings). Additionally,
confirm it has internal connectivity by pinging the other instances
you’ve created (by IP).
If all of these tests behave correctly, your Calico-based OpenStack deployment is in good shape.
If you find that none of the advice below solves your problems, please use our diagnostics gathering script to generate diagnostics, and then raise a GitHub issue against our repository. To generate the diags, run
VMs cannot DHCP
This can happen if your iptables is configured to have a default DROP
behaviour on the INPUT or FORWARD chains. You can test this by running
iptables -L -t filter and checking the output. You should see
something that looks a bit like this:
Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED ACCEPT icmp -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:ssh REJECT all -- anywhere anywhere reject-with icmp-host-prohibited Chain FORWARD (policy ACCEPT) target prot opt source destination REJECT all -- anywhere anywhere reject-with icmp-host-prohibited Chain OUTPUT (policy ACCEPT) target prot opt source destination
The important sections are
Chain INPUT and
Chain FORWARD. Each of
those needs to have a policy of
ACCEPT. In some systems, this policy
may be set to
DENY. To change it, run
iptables -P <chain> ACCEPT,
<chain> with either
Note that doing this may be considered a security risk in some networks. A future Calico enhancement will remove the requirement to perform this step.
Routes are missing in the FIB.
If routes to some VMs aren’t present when you run
route, this suggests
that your BGP sessions are not functioning correctly. Your BGP daemon
should have either an interactive console or a log. Open the relevant
one and check that all of your BGP sessions have come up appropriately
and are replicating routes. If you’re using a full mesh configuration,
confirm that you have configured BGP sessions with all other Calico
VMs Cannot Ping Non-VM IPs
Assuming all the routes are present in the FIB (see above), this most commonly happens because the gateway is not configured with routes to the VM IP addresses. To get full Calico functionality the gateway should also be a BGP peer of the compute nodes (or the route reflector).
Confirm that your gateway has routes to the VMs. Assuming it does, make sure that your gateway is also advertising those routes to its external peers. It may do this using eBGP, but it may also be using some other routing protocol.
VMs Cannot Ping Other VMs
Before continuing, confirm that the two VMs are in security groups that allow inbound traffic from each other (or are both in the same security group which allows inbound traffic from itself). Traffic will not be routed between VMs that do not allow inbound traffic from each other.
Assuming that the security group configuration is correct, confirm that the machines hosting each of the VMs (potentially the same machine) have routes to both VMs. If they do not, check out the troubleshooting section above.
Web UI Shows Error Boxes Saying “Error: Unable to get quota info” and/or “Error: Unable to get volume limit”
This is likely a problem encountered with mapping devices in
OpenStack’s logical volume management component. Many of these can be
resolved by restarting
service cinder-volume restart service cinder-scheduler restart service cinder-api restart
Cannot create instances, error log says “could not open /dev/net/tun: Operation not permitted”
This is caused by having not restarted libvirt after you add lines to
the end of
/etc/libvirt/qemu.conf. This can be fixed by either
rebooting your entire system or running:
service libvirt-bin restart