Skip to main content

Posts

Showing posts from March, 2021

Remote file size with a HTTP HEAD request

I learned from my brilliant colleagues a nice trick that left me once again wondering why on earth I discover these things only now. It's funny to run into something literally everyone else knows, and somehow, I miss. Say you have a file that you want to download on some trusted server. You may have a version of the file downloaded, but it may be somehow corrupted or partially downloaded. You could verify the file integrity by comparing the local and remote file hashes, and that is indeed a proper way to do it.  If you trust the remote, you can use plain old HTTPS to get the headers only with a HEAD request and get the content length. Just compare the result to local, and there you have it, a naive yet adequate method to check if you need to redownload a file. Note that this does not validate that the file contents are equal. 

Rule of three

Of all the horrific code I have written, by far, the most harmful and destructive have involved too early abstraction. Everyone knows the issues stemming from the wrong abstraction. Even copy-paste code would have been a better choice than creating some seemingly fancy, say, adapter pattern implementation. To be honest, I think copy-paste has a worse reputation than it should. A given project might have started with best practices, module structure, libraries of a prior project/16k star primer. That can be good! Some best practices naturally transcend projects. We should bring in the lessons learned but the timing matters. Upon a closer look, the project might indeed have agreeable abstractions. Still, those could be hollow and bring only extra LOCs and solve the issues faced in the previous projects, which have never occurred in the current one. Upon recently starting two soon-to-be large projects, I learned to leave some wisdom from previous lives behind. While deep in the coding zen...

Emit structured Postgres data change events with wal2json

A common thing I see in an enterprise system is that when an end-user does some action, say add a user, the underlying web of subsystems adds the user to multiple databases in separate transactions. Each of these transactions may happen in varying order and, even worse, can fail, leaving the system in an inconsistent state. A better way could be to write the user data to some main database and then other subsystems like search indexes, pull/push the data to other interested parties, thus eliminating the need for multiple end-user originating boundary transactions. That's the theory part; how about a technical solution. The idea of this post came from the koodia pinnan alla podcast about event-driven systems and CDC . One of the discussion topics in the show is emitting events from Postgres transaction logs.  I built an utterly simple change emitter and reader using Postgres with the wal2json transaction decoding plugin and a custom go event parser. I'll stick to the boring ...