Long-running transactions considered harmful

When working with ACID storages developers have a powerful tool at their disposal – transactions. Most RDBMS systems out there – think MySQL or PostgreSQL – are fundamentally ACID storages. Developers learn transactions as part of any basics course on RDBMSes. They seem pretty organic to underlying databases. One may get really surprised by the lack of such a feature in NoSQL storages. Or more precisely much more limited functionality of the feature.

Using transactions with ACID storages may seem like a good programming technique, a well-established pattern that should be always followed. Doing somehow related changes? Wrap the whole command set in a transaction. What if your command set will fail in the middle? You won’t have to collect any garbage left by already executed commands. You don’t have to deal with any possible inconsistencies. The world will be a better place 🙂

But as with most of the things in computer science actually usage of transactions is a tradeoff. And one usually has to learn that the hard way. Or at least I did.

The story

I once worked on a SaaS-system and we created a feature of demo accounts for potential customers. Basically we create an account with some sample data for a potential customer. With that set up they can evaluate the system as it stands before buying the thing.

To create a demo account we use a template company with template data. After cloning that template we also generate reports to fill in corresponding sections of the app. With around 1700 items of template data we generated 30 days of report data hence overall creating more than 50 thousand different records in the DB. It was PostgreSQL via AWS RDS.

That process takes around 30 minutes to complete. And one long-running transaction wrapped the whole thing That ensured it’s all or nothing in case something will go wrong. Everything was fine for a while – demos were created and used and all was good. Until once we faced a serious DB engine outage. Queries that seem simple got stuck for a long time. We had to investigate & found a solution quickly.

Epimyth

Without going into the nitty-gritty of the investigation & actually dealing with the situation the bottom line is as follows. Demo account creation was occasionally happening when there was another pretty intense DB workload. The demo account transaction was long, involved a lot of DB tables it produced a lot of different locks. Other concurrent DB processes had to wait for releasing of these locks before proceeding. That lead to the disaster we faced.

Should you avoid long-running transactions like a plaque? Of course not. Just remember that transaction is basically a tradeoff between consistency and system throughput/scalability. And yeah, transactions have different isolation levels. That may also be a way of fine-tuning DB usage in the context of a particular project.

Mental Models for Performance Engineering

I’ve been tinkering with performance engineering (PE) as one of the areas of interest for a while lately. From experience I built up a few mental models / metaphors which help with reasoning about systems performance. Let’s get through these mental models for performance engineering one by one. The most significant achievement thus far was…

DOOM уже не тот (и не торт) – часть вторая

Совсем недавно вышла очередная игра в серии ребута классического Doom – Doom: The Dark Ages aka D:DA. Я уже отметился как не шибко фанат творчества господина Мартина и честно хотел остаться сторонним наблюдателем, ведь DOOM уже не тот. Но какой-то очередной ролик на ЮТубоне таки меня склонил поставить НЕДОСТУПНЫЙ в РФ через Стим D:DA =)…

Unsettling story of proxy_next_upstream in Nginx

It turns out that there’s quite a bunch of people who bumped into a particular Nginx upstream handling behavior. Not that long ago I also joined the club. It usually manifests as unavailability of some service behind Nginx as a reverse proxy. Clients start to get “502 Bad Gateway” after some change in the service…

Tribute to the Blog of Uwe Friedrichsen

At the end of November last year I bumped into a blog which immediately captured my attention. I eagerly read during a few weeks after as the blog turned out to be a treasury of value =) Articles are nicely interconnected with each other, so it’s easy to fall into a rabbithole of great content…

Технический аудит при закупках ИТ-систем

Корпоративные информационные системы (ИС) играют ключевую роль в управлении и оптимизации бизнес-процессов современных организаций. Как и любые другие технические активы – а как я уже утверждал ИС это вполне себе технический актив – эти системы проходят определённые этапы жизненного цикла, начиная от планирования и заканчивая утилизацией. Пока не очень понятно при чем тут технический аудит…