CentOS Infra public service dashboard

As soon as you're running some IT services, there is one thing that you already know : you'll have downtimes, despite all your efforts to avoid those...

As the old joke says : "What's up ?" asked the Boss. "Hopefully everything !" answered the SysAdmin guy ....

You probably know that the CentOS infra is itself widespread, and subject to quick move too. Recently we had to announce an important DC relocation that impacts some of our crucial and publicly facing services. That one falls in the "scheduled and known outages" category, and can be prepared. For such "downtime" we always announced that through several mediums, like sending a mail to the centos-announce, centos-devel (and in this case , also to the ci-users) mailing lists. But even when we announce that in advance, some people forget about it, or people using (sometimes "indirectly") the concerned service are surprized and then ask about it (usually in #centos or #centos-devel on irc.freenode.net).

In parallel to those "scheduled outages", we have also the worst ones : the unscheduled ones. For those ones, depending on the impact/criticity of the impacted service, and also the estimated RTO, we also send a mail to the concerned mailing lists (or not).

So we just decided to show a very simple and public dashboard for the CentOS Infra, but only covering the publicly facing services, to have a quick overview of that part of the Infra. It's now live and hosted on https://status.centos.org.

We use Zabbix to monitor our Infra (so we build it for multiple arches, like x86_64,i386,ppc64,ppc64le,aarch64 and also armhfp) , including through remote zabbix proxies (because of our "distributed" network setup right now, with machines all around the world). For some of those services listed on status.centos.org, we can "manually" announce a downtime/maintenance period, but Zabbix also updates on its own that dashboard. The simple way to link those together was to use zabbix custom alertscripts and you can even customize those to send specific macros and have that alertscript just parsing and then updating the dashboard.

We hope to enhance that dashboard in the future, but it's a good start, and I have to thank again Patrick Uiterwijk who wrote that tool for Fedora initially (and that we adapted to our needs).

Arrfab's blog Some tips and tricks, mostly around CentOS

CentOS Infra public service dashboard