Diagnosing nf_conntrack/nf_conntrack_count issues on CentOS mirrorlist nodes
Yesterday, I got some alerts for some nodes in the CentOS Infra from both our monitoring system, but also confirmed by some folks reporting errors directly in our #centos-devel irc channel on Freenode.
The impacted nodes were the nodes we use for mirrorlist service. For people not knowing what they are used for, here is a quick overview of what happens when you run "yum update" on your CentOS node :
- yum analyzes the .repo files contained under /etc/yum.repos.d/
- for CentOS repositories, it knows that it has to use a list of mirrors provided by a server hosted within the centos infra (mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates&infra=$infra )
- yum then contacts one of the server behind "mirrorlist.centos.org" (we have 4 nodes so far : two in Europe and two in USA, all available over IPv4 and IPv6)
- mirrorlist checks the src ip and sends back a list of current/up2date mirrors in the country (some GeoIP checks are done)
- yum then opens connection to those validated mirrors
We monitor the response time for those services, and average response time is usually < 1sec (with some exceptions, mostly due to network latency …