Scalable Backend Secret Sauce

There’s an implementation principle of a scalable backend system which I find among the most important ones. It’s about minimization of compute resources footprint of a request handler for every possible request. This principle isn’t only applicable to handlers in request/response scenarios, e.g. REST or GraphQL APIs. For a session-based handler like a websocket handler pretty much same logic applies. In such scenario it’s still a key if one aims at squeezing the most scalability out of the backend system they deal with.

A simple example to illustrate

Let’s dive in right away with a simple Pythonic example. Consider following two simple Django view functions:

from django.http import HttpRequest, Http404
from myapp.models import MyModel


def view_with_len(request: HttpRequest, model_pk: int):
    if len(MyModel.objects.filter(pk=model_pk)) == 0:
        raise Http404("No model with such PK")
    # Rest of the code isn't relevant for this example


def view_with_orm_count(request: HttpRequest, model_pk: int):
    if MyModel.objects.filter(pk=model_pk).count() == 0:
        raise Http404("No model with such PK")
    # Rest of the code isn't relevant for this example

Let’s put aside a bit controversial way of checking the existence of the specified row in MyModel‘s database table. In practice we’re likely to use EAFP-ish try/except with the combination of Django object manager .get method in this particular case. Nevertheless such implementation allows pretty good demo of footprint minimization approach.

The problem for a scalable backend

At first sight both pieces of code do pretty much the same thing. Moreover while system load is low both would work mostly identical. The situation will change drastically with the increase of the system’s load. There are 2 main factors: row count of MyModel‘s database table and parallel execution of request handlers.

Caveat: by parallel execution in this case I mean any way of having multiple in-flight requests. The difference in compute footprint for these 2 snippets is related to memory rather than CPU. In such case all we need is multiple in-memory request handling structures. That can be achieved with either synthetic or physical threads of execution in the same process space. Similarly coroutines/greenlets or multiple OS processes will have similar effect.

With that clarified let’s continue. The raising number of both parallel executions and rows in the table will result in the higher memory consumption of view_with_len view in comparison to with_with_orm_count. That will lead to much less scalability of the former in comparison to the latter. The culprit of that is simple: Django object manager .count method results in a single SQL query retrieving number of rows. Usage of built-in len() function requires ORM to fetch all the rows and populate Python MyModel objects before calculating the length of the resulting collection. The amount of insignificant work for this particular case done by view_with_len will be enormous with large datasets. It can even lead to stability issues, e.g. timeouts, rather than just hampering the scalability of the system.

Application of the principle

The principle of minimal compute resources footprint sounds easy in theory. It could be really hard to consistently apply it in practice when it comes to such subtle differences as in the example we just covered. Basically for every request handler there could be an optimal logic flow which will minimize the footprint. It’s usually all about just 2 things to get to a scalable backend. The first is short circuiting execution as soon as possible for lower CPU consumption. The second is loading in memory only the data which is absolutely required for the current state of the request handler execution for lower memory usage. Here are some examples of applying these practices:

Validate input present in request handler right away & short circuit execution for any failed validation.
Validate system state while materializing minimal required dataset & short circuit execution if operation shouldn’t be permitted.
Ensure the dataset used by the request handler is always as small as possible for the task at hand.

Development team/organization awareness is a key for long-term maintainability of the well optimized request handlers. Unaware developer can unintentionally break the optimal flow by introducing new behavior or even during refactoring. The best practice is to make this an inherent part of the overall development culture if scalability is really important for the systems people work on.

2 thoughts on “Scalable Backend Secret Sauce”

Comments are closed.

Pingback: MEMORY LAYERS LATENCY DIFFERENCE - Blog of Dan Ivy
Pingback: 503 Backend Fetch Failed - WTF? - Blog of Dan Ivy

Mental Models for Performance Engineering

I’ve been tinkering with performance engineering (PE) as one of the areas of interest for a while lately. From experience I built up a few mental models / metaphors which help with reasoning about systems performance. Let’s get through these mental models for performance engineering one by one. The most significant achievement thus far was…

DOOM уже не тот (и не торт) – часть вторая

Совсем недавно вышла очередная игра в серии ребута классического Doom – Doom: The Dark Ages aka D:DA. Я уже отметился как не шибко фанат творчества господина Мартина и честно хотел остаться сторонним наблюдателем, ведь DOOM уже не тот. Но какой-то очередной ролик на ЮТубоне таки меня склонил поставить НЕДОСТУПНЫЙ в РФ через Стим D:DA =)…

Unsettling story of proxy_next_upstream in Nginx

It turns out that there’s quite a bunch of people who bumped into a particular Nginx upstream handling behavior. Not that long ago I also joined the club. It usually manifests as unavailability of some service behind Nginx as a reverse proxy. Clients start to get “502 Bad Gateway” after some change in the service…

Tribute to the Blog of Uwe Friedrichsen

At the end of November last year I bumped into a blog which immediately captured my attention. I eagerly read during a few weeks after as the blog turned out to be a treasury of value =) Articles are nicely interconnected with each other, so it’s easy to fall into a rabbithole of great content…

Технический аудит при закупках ИТ-систем

Корпоративные информационные системы (ИС) играют ключевую роль в управлении и оптимизации бизнес-процессов современных организаций. Как и любые другие технические активы – а как я уже утверждал ИС это вполне себе технический актив – эти системы проходят определённые этапы жизненного цикла, начиная от планирования и заканчивая утилизацией. Пока не очень понятно при чем тут технический аудит…