Recently I had to update the existing code running behind mirrorlist.centos.org (the service that returns you a list of validated mirrors for yum, see the /etc/yum.repos.d/CentOS*.repo file) as it was still using the Maxmind GeoIP Legacy country database. As you can probably know, Maxmind announced that they're discontinuing the Legacy DB, so that was one reason to update the code. Switching to GeoLite2 , with python2-geoip2 package was really easy to do and so was done already and pushed last month.

But that's when I discussed with Anssi (if you don't know him, he's maintaining the CentOS external mirrors DB up2date, including through the centos-mirror list ) that we thought about not only doing that change there, but in the whole chain (so on our "mirror crawler" node, and also for the isoredirect.centos.org service), and random chat like these are good because suddenly we don't only want to "fix" one thing, but also take time on enhancing it and so adding more new features.

The previous code was already supporting both IPv4 and IPv6, but it was consuming different data sources (as external mirrors were validated differently for ipv4 vs ipv6 connnectivity). So the first thing was to rewrite/combine the new code on the "mirror crawler" process for dual-stack tests, and also reflect that change o nthe frontend (aka mirrorlist.centos.org) nodes.

While we were working on this, Anssi proposed to also not adapt the isoredirect.centos.org code, but convert it in the same python format as the mirrorlist.centos.org, which he did.

Last big change also that was added is the following : only some repositories/architectures were checked/validated in the past but not all the other ones (so nothing from the SIGs and nothing from AltArch, so no mirrorlist support for i386/armhfp/aarch64/ppc64/ppc64le).

While it wasn't a real problem in the past when we launched the SIGs concept, and that we added after that the other architectures (AltArch), we suddenly started suffering from some side-effects :

  • More and more users "using" RPM content from mirror.centos.org (mainly through SIGs - which is a good indicator that those are successful, which is a good "problem to solve")
  • We are currently losing some nodes in that mirror.centos.org network (it's still entirely based on free dedicated servers donated to the project)

To address first point, offloading more content to the 600+ external mirrors we have right now would be really good, as those nodes have better connectivity than we do, and with more presence around the globe too, so slowly pointing SIGs and AltArch to those external mirrors will help.

The other good point is that , as we switched to the GeoLite2 City DB, it gives us more granularity and also for example, instead of "just" returning you a list of 10 validated mirrors for USA (if your request was identified as coming from that country of course), you now get a list of validated mirrors in your state/region instead. That means that then for such big countries having a lot of mirrors, we also better distribute the load amongst all of those, which is a big win for everybody - users and mirrors admins - )

For people interested in the code, you'll see that we just run several instances of the python code, behind Apache running with mod_proxy_balancer. That means that if we just need to increase the number of "instances", it's easy to do but so far it's running great with 5 running instances per node (and we have 4 nodes behind mirrorlist.centos.org). Worth noting that on average, each of those nodes gets 36+ millions requests per week for the mirrorlist service (so 144+ millions in total per week)

So in (very) short summary :

  • mirrorlist.centos.org code now supports SIGs/AltArch repositories (we'll sync with SIGs to update their .repo file to use mirrorlist= instead of baseurl= soon)
  • we have better accuracy for large countries, so we redirect you to a 'closer' validated mirror

One reminder btw : you know that you can verify which nodes are returned to you with some simple requests :

# to force ipv4
curl 'http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=updates' -4
# to force ipv6
curl 'http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=updates' -6

Last thing I wanted to mention was a potential way to fix point #2 from the list : when I checked in our "donated nodes" inventory, we still are running CentOS on nodes from ~2003 (yes, you read that correctly), so if you want to help/sponsor the CentOS Project, feel free to reach out !