Skip to main content


Showing posts from 2017

OpenShift logging issues

I have been digging into some logging issues in an OpenShift production system. The first  problem what we noticed was that the pod logs viewed from the web console were clearly missing some lines. Initially, we thought that this was due to some rate limiting for the web console itself but it turned out to be an issue at the OS level. Another issue what we initially thought was related to the first one was that the Elasticsearch cluster which contains the aggregated logs from all nodes was missing some logs as well and we even had the Elasticsearch cluster members crashing a couple of times without being able to recover the cluster health. It turned out that we had two separate issues with similar symptoms First thing was to check why the web console was missing logs. Openshift (kubernetes) is logging the container logs to journald. After tailing the journald logs a while, it seemed fine. Upon closer inspection, I saw something strange though. It seems that the containers that prod

Elixir - first impressions

As any developer heading towards a burn-out, I spent my summer vacation learning a new programming language. I chose Elixir because it is a functional language with actor-like programming style. I thought that at least the latter feature would help me get started. Actually, the first thing I ended up figuring out was how the Erlang runtime works. I was interested in how the Erlang processes work in relation to the operating system. It turned out that the concurrency model is not based on spawning multiple user level threads but rather on Erlang runtime abstractions which isolate the running code to the Erlang processes which can communicate with each other via message passing. Erlang runtime has a scheduler which can run multiple processes concurrently on the runtime with a limited set of OS user level threads (number of available CPU threads). This is hardly surprising after reading about high performing applications written in Erlang. Threads can be expensive especially in the ca

Embedded OrientDB on a OpenShift / Kubernetes cluster

A few tips on setting up an embedded OrientDB to run on OpenShift / Kubernetes cluster. Set the ORIENTDB_NODE_NAME system property or environment variable. If your database volumes are host volumes, you can use the downwards API spec.nodeName . If the node name contains dots, replace those with dashes for example. If you use something like OpenShift persistence volumes, make sure that the running pods ORIENTDB_NODE_NAME  matches with the node name values it reads from the DB. Use at least 3 replicas and don't use even number due to split-brain clustering issues If you use rolling upgrade strategy, give the pods some time to start up the DBs so no more than one pod is unavailable at a time. This way syncing up the cluster status becomes smoother. Use the newNodeStrategy dynamic OrientDb distribution configuration parameter   so unreachable nodes don't break up the write quorum so easily. Use Hazelcast to discover the cluster members. There is a library  for that

Flame graph from a Scala app

Apologies for the large SVG I got inspired by a Devoxx talk about flame graphs and how they can visualize what is happening on a JVM process. Getting a graph is actually quite simple. You only need a recent enough Java 8 JDK, a running subject JVM process running on linux, perf  which is part of kernel utils in most distributions and a couple of simple profiling tools which are open source. Detailed infomation can be found in a blog post by Nitsan Wakart here So what is in that SVG. It is illustrating what was happening on a Scala app I'm running on a DigitalOcean pod sampled 100 times a second during a 40 second period. The bars describe call stacks and the topmost item is always the one running in the CPU. More details how to read the graph can be found here In this case the stacks are divided to threads. The leftmost stuff (thread) contains a
A few random things that I have been dealing with My Digitalocean droplet suffered recently a brute force ssh attack. Unfortunately I noticed it a couple of days after the attack had happened but luckily the act caused little harm except very high CPU usage from sshd process for a few days. I'm not sure how to really protect against such attacks (cheaply) but I decided to try out fail2ban. With fail2ban I could protect the server also against attacks towards nginx. Installing it was simple enough and I saw it was working rather well. There were some 3K ssh login attempts per day and the iptables based port blocking reduced the amount to some hundreds. After a while though I noticed that the fail2ban stopped blocking unauthorized IPs. I took a look at fail2ban github and saw some issues with ssh regex filters (fail2ban works by monitoring logs and matching those against predefined regexes). I made some small adjustments but still no luck, it did not ban anything. I turned on d

Unstable Vaadin UI tests

I have experienced a lot of issues with UI testing with Vaadin and Selenium. The test cases might seem to run OK initially but turn out very easily to be unstable, especially if the machine running tests is much faster/slower than the machine the tests have been written. The setups I have used contains some version of selenium, Robot framework, jBehave or CasperJS and PhantomJS, Chrome or Firefox. I haven't used the Vaadin Testbench. Here are my five cents to create stable Vaadin UI tests. Don't use the in-built "is page loaded" methods, those rarely apply to single page JS apps. Use the JS object "vaadin" and its properties isActive() and initialized to determine when it is OK to modify the UI. Run in production mode, I have had a lot fewer issues like this. Capture the browser logs, filter out there all info category messages and see if there are errors. Use xpaths to find the correct elements. I have noticed that it helps if you wait first for some

Using actors as a throttle

I've been working on a web shop integration project from time to time for last six months. The use case is that there are several different instances of a webshop (slave) and one master shop. All product information should be integrated from the master shop to the slaves on regular intervals. There are also some business rules applied to the products upon integration. The integration is set to happen on a certain time of day. It is a relatively long process because it checks through all the products in all shops. That's fine though, there is no requirements on how quick the integration should be. It should rather be a resource constrained process so it would not affect the users using the web shops. Originally I thought this would be a perfect use case for serverless application running in AWS Lambda for example. I also tried out OpenWhisk from IBM which can be run as a self-hosted serverless platform. While it would have been interesting to try out those technologies I