(ab)using Alias for Zabbix

It's not a secret that we use Zabbix to monitor the infra. That's even a reason why we (re)build it for some other architectures, including aarch64,ppc64,ppc64le on CBS and also armhfp

There are really cool things in Zabbix, including Low-Level Discovery. With such discovery, you can create items/prototypes/triggers that will be applied "automagically" for each discovered network interface, or mounted filesystem. For example, the default template (if you still use it) has such item prototypes and also graph for each discovered network interface and show you the bandwidth usage on those network interfaces.

But what happens if you suddenly want to for example to create some calculated item on top of those ? Well, the issue is that from one node to the other, interface name can be eth0, or sometimes eth1, and with CentOS 7 things started to also move to the new naming scheme, so you can have something like enp4s0f0. I wanted to create a template that would fit-them-all, so I had a look at calculated item and thought "well, easy : let's have that calculated item use a user macro that would define the name of the interface ...

CentOS Infra public service dashboard

As soon as you're running some IT services, there is one thing that you already know : you'll have downtimes, despite all your efforts to avoid those...

As the old joke says : "What's up ?" asked the Boss. "Hopefully everything !" answered the SysAdmin guy ....

You probably know that the CentOS infra is itself widespread, and subject to quick move too. Recently we had to announce an important DC relocation that impacts some of our crucial and publicly facing services. That one falls in the "scheduled and known outages" category, and can be prepared. For such "downtime" we always announced that through several mediums, like sending a mail to the centos-announce, centos-devel (and in this case , also to the ci-users) mailing lists. But even when we announce that in advance, some people forget about it, or people using (sometimes "indirectly") the concerned service are surprized and then ask about it (usually in #centos or #centos-devel on

In parallel to those "scheduled outages", we have also the worst ones : the unscheduled ones. For those ones, depending on the impact/criticity of the impacted service, and also the estimated RTO, we also send a mail to the concerned mailing ...

Generating multiple certificates with Letsencrypt from a single instance

Recently I was discussing with some people about TLS everywhere, and we then started to discuss about the Letsencrypt initiative. I had to admit that I just tested it some time ago (just for "fun") but I suddenly looked at it from a different angle : while the most used case is when you install/run the letsencrypt client on your node to directly configure it, I have to admit that it's something I didn't want to have to deal with. I still think that proper web server configuration has to happen through cfgmgmt, and not through another process. (and same for the key/cert distribution, something for a different blog post maybe).

If so you're (pushing|pulling) automatically your web servers configuration from $cfgmgmt, but that you want to use/deploy TLS certificates signed by letsencrypt, what can you do ? Well, the good news is that you don't have to be forced to let the letsencrypt client touch your configuration at all : you can use the "certonly" option to just generate the private key locally, send the csr and get the signed cert back (and the whole chain too) One thing to know about letsencrypt is ...

IPv6 connectivity status within the infra

Recently, some people started to ask proper IPv6/AAAA record for some of our public mirror infrastructure, like, and also

Reason is that a lot of people are now using IPv6 wherever possible and from a CentOS point of view, we should ensure that everybody can have content over (legacy) ipv4 and ipv6. Funny that I call ipv4 "legacy" as we still have to admit that it's still the default everywhere, even in 2016 with the available pools now exhausted.

While we had already some AAAA records for some of our public nodes (like as an example), I started to "chase" after proper and native ipv6 connectivity for our nodes. That's where I had to take contact with all our valuable sponsors. First thing to say is that we'd like to thank them all for their support for the CentOS Project over the years : it wouldn't have been possible to deliver multiple terrabytes of data per month without their sponsorship !

WRT ipv6 connectivity that's where the results of my quest where really different : while some DCs support ipv6 natively, and even answer you in 5 minutes ...

Kernel 3.10.0-327 issue on AMD Neo processor

As CentOS 7 (1511) was released, I thought it would be a good idea to update several of my home machines (including kids' workstations) with that version, and also newer kernel. Usually that's just a smooth operation, but sometimes some backported features/new features, especially in the kernel, can lead to some strange issues. That's what happened for my older Thinkpad Edge : That's a cheap/small thinkpad that Lenovo did several years ago ( circa 2011 ), and that I used a lot just when travelling, as it only has a AMD Athlon(tm) II Neo K345 Dual-Core Processor. So basically not a lot of horse power, but still something convenient just to read your mails, remotely connect through ssh, or browse the web. When rebooting on the newer kernel, it panics directly.

Two bug reports are open for this, one on the CentOS Bug tracker, linked also to the upstream one. Current status is that there is no kernel update that will fix this, but there is a easy to implement workaround :

  • boot with the initcall_blacklist=clocksource_done_booting kernel parameter added (or reboot on previous kernel)
  • once booted, add the same parameter at the end of the GRUB_CMDLINE_LINUX=" .." line ...

Kernel IO wait and megaraid controller

Last friday, while working on something else (working on "CentOS 7 userland" release for Armv7hl boards), I got notifications coming from our Zabbix monitoring instance complaining about web scenarios failing (errors due to time outs) , and also then also about "Disk I/O is overloaded" triggers (checking the cpu iowait time). Usually you'd verify what happens in the Virtual Machine itself, but even connecting to the VM was difficult and slow. But once connected, nothing strange, and no real activity , not even on the disk (Plenty of tools for this, but iotop is helpful to see which process is reading/writing to the disk in that case), but iowait was almost at 100%).

As said, it was happening suddenly for all Virtual Machines on the same hypervisor (CentOS 6 x86_64 KVM host), and even the hypervisor was suddenly complaining (but less in comparison with the VMs) about iowait too. So obviously, it wasn't really something not being optimized at the hypervisor/VMS, but something else. That rang a bell, as if you have a raid controller, and that battery for example is to be replaced, the controller can decide to stop all read/write cache, so slowing down ...

CentOS AltArch SIG status

Recently I had (from an Infra side) to start deploying KVM guests for the ppc64 and ppc64le arches, so that AltArch SIGs contributors could start bootstrapping CentOS 7 rebuild for those arches. I'll probably write a tech review about Power8 and the fact you can just use libvirt/virt-install to quickly provision new VMs on PowerKVM , but I'll do that in a separate post.

Parallel to ppc64/ppc64le, armv7hl interested some Community members, and the discussion/activity about that arch is discussed on the dedicated mailing list. It's slowly coming and some users already reported having used that on some boards (but still unsigned and no updates packages -yet- )

Last (but not least) in this AltArch list is i686 : Johnny built all packages and are already publicly available on , each time in parallel to the x86_64 version. It seems that respinning the ISO for that arch and last tests would be the only things to do.

If you're interested in participating in AltArch (and have special interesting a specific arch/platform), feel free to discuss that on the centos-devel list !

CentOS Dojo in Barcelona

So, thanks to the folks from Opennebula, we'll have another CentOS Dojo in Barcelona on Tuesday 20th October 2015. That even will be colocated with the Opennebulaconf happening the days after that Dojo. If you're attending the OpennebulaConf, or if you're just in the area and would like to attend the CentOS Dojo, feel free to register

Regarding the Dojo content, I'll be myself giving a presentation about Selinux : covering a little bit of intro (still needed for some folks afraid of using it , don't know why but we'll change that ...) about selinux itself, how to run it on bare-metal, virtual machines and there will be some slides for the mandatory container hype thing. But we'll also cover managing selinux booleans/contexts, etc through your config management solution. (We'll cover puppet and ansible as those are the two I'm using on a daily basis) and also how to build and deploy custom selinux policies with your config management solution.

On the other hand, if you're a CentOS user and would like yourself to give a talk during that Dojo, feel free to submit a talk ! More informations about the Dojo ...

Ext4 limitation with GDT blocks number

In the last days, I encountered a strange issue^Wlimitation with Ext4 that I wouldn't have thought of. I've used ext2/ext3/ext4 for quite some time and so I've been used to resize the filesystem "online" (while "mounted"). In the past you had to use ext2online for that, then it was integrated into resize2fs itself.

The logic is simple and always the same : extend your underlaying block device (or add another one), then modify the LVM Volume Group (if needed), then the Logical Volume and finally the resize2fs operation, so something like

lvextend -L +${added_size}G /dev/mapper/${name_of_your_logical_volume} 
resize2fs /dev/mapper/${name_of_your_logical_volume}

I don't know how much times I've used that, but this time resize2fs wasn't happy :

resize2fs: Operation not permitted While trying to add group #16384

I remember having had in the past an issue because of the journal size not being big enough. But this wasn't the case here.

FWIW, you can always verify your journal size with dumpe2fs /dev/mapper/${name_of_your_logical_volume} |grep "Journal Size"

Small note : if you need to increase the journal size, you have to do it "offline" as you have to remove the journal and ...

Implementing TLS for postfix

As some initiatives (like Let's Encrypt as one example) try to force TLS usage everywhere. We thought about doing the same for the infra. Obviously we already had some x509 certificates, but not for every httpd server that was serving content for CentOS users. So we decided to enforce TLS usage on those servers. But TLS can be used obviously on other things than a web server.

That's why we considered implementing something for our Postfix nodes. The interesting part is that it's really easy (depending of course at the security level one may want to reach/use). There are two parts in the postfix that can be configured :

  • outgoing mails (aka your server sends mail to other SMTPD servers)
  • incoming mails (aka remote clients/servers send mail to your postfix/smtpd server)

Let's start with the client/outgoing part : just adding those lines in your will automatically configure it to use TLS when possible, but otherwise fall back on clear if remote server doesn't support TLS :

# TLS - client part
smtp_tls_security_level = may
smtp_tls_loglevel = 1
smtp_tls_session_cache_database = btree:/var/lib/postfix/smtp_scache 

The interesting ...