Skip to main content

Posts

Showing posts from October, 2017

OpenShift logging issues

I have been digging into some logging issues in an OpenShift production system. The first  problem what we noticed was that the pod logs viewed from the web console were clearly missing some lines. Initially, we thought that this was due to some rate limiting for the web console itself but it turned out to be an issue at the OS level. Another issue what we initially thought was related to the first one was that the Elasticsearch cluster which contains the aggregated logs from all nodes was missing some logs as well and we even had the Elasticsearch cluster members crashing a couple of times without being able to recover the cluster health. It turned out that we had two separate issues with similar symptoms First thing was to check why the web console was missing logs. Openshift (kubernetes) is logging the container logs to journald. After tailing the journald logs a while, it seemed fine. Upon closer inspection, I saw something strange though. It seems that the containers that prod