Code Modularity: Practical Tips

For a while, I’ve been a core contributor and maintainer of an internal project at the company I now work for. Code modularity is a desired trait for any codebase which is not dead. For context, the project is written in Python. As you may guess from my other articles here, Python is a language I’ve been in love with for over a decade now.

Yet Another Merge Request

Not long ago, we got another merge request (MR). The code was definitely created with LLM assistance. But overall, it was pretty fine, except perhaps for some really long files, closer to 1k LOC. Perhaps you know how it goes: a long list of functions calling other functions. A typical workhorse of enterprise development.

It wasn’t the first time we were about to accept a long file into the repo. There were more of them, and the smell of those files was bothering me. With a bit of simple shell magic, I gathered some stats.

find src -type f -name "*.py" -exec wc -l {} + | sed '$d' | awk '{print $1}' | grep -v "^0$" > loc_stats.txt

sort -n loc_stats.txt | awk '{a[i++]=$1} END {print "Median:", a[int(i/2)], " P90:", a[int(i*0.9)], " Max:", a[i-1]}'

The median source file length was about 100 LOC, and p90 was maybe 300. Two files were more than 1k LOC, and a few were more than 500.

We accepted the MR, but I decided to finally fight these monster files.

Code Modularity: What Does It Mean?

It sounds easy on the surface: these files lack “modularity.” We all know the story: one should build modular software! Use cohesive modules, mate! But what does that mean in practice? My fight against monster files seemed like a perfect opportunity to figure it out.

One of the things I love about Python is how straightforward things generally are. A “module” is just a source code file. A “package” is a directory of source code files with a few twists (__init__, __main__ etc). The modularity riddle should definitely conform to the rules of the language. I’d follow Pythonic definitions of modules and packages down the road.

Making Sense of Module Interface

The first step in untangling the mess is separating a module’s public interface from its implementation details. Pythonic convention is, again, straightforward: just prefix internals with an underscore. So the first pass over a monster module is as follows: for every module attribute, in Pythonic terminology again, find out whether there are imports and usages elsewhere. If there are none, then prepend its name with an underscore.

# Before

def do_this(...):
  ...

def do_that(...):
  ...

def yet_another_fn(...):
  ...

# After

def _do_this(...):
  ...

def _do_that(...):
  ...

def yet_another_fn(...):
  ...

From that point, we can start checking the top-down readability of the module interface. When we import an attribute, does its name make sense? If we look at the attribute definition, is it readable, or does it span multiple screens full of code? Is any duplication popping out or can be found by IDE tooling?

After a while, structure starts to appear. Some internal functions are reused among public entry points. Others are used only in one “branch” started by an entry point. Some functions can be grouped by one criterion or another.

Code Modularity: Package FTW!

With that in mind, we can break down the module and transform it into a package. Add __init__ for public entry points. The new package name should be equal to the old module name. All that helps keep imports scattered throughout the rest of the code as they are.

Make the existing module a submodule, a module inside the newly created package. New package modules will be created by breaking it down.

# Before

src
└ my_monster_module.py

# After

src
└ my_monster_module
  ├ __init__.py
  └ old_impl.py

A public entry point usually goes, along with its own internal dependencies, into a submodule.

Attributes reused among public entry points go into their own submodules. These are imported into the entry-point modules. Reused attributes can be part of the package’s public interface as well.

# Before

src
└ my_monster_module
  ├ __init__.py
  └ old_impl.py

# After

src
└ my_monster_module
  ├ __init__.py
  ├ common.py
  ├ entrypoint1.py
  └ entrypoint2.py

Conclusion

Don’t rely too much on LLMs – vibe coded stuff with very little structure is fine for prototypes. For any serious codebase you should define your own rules. And LLMs will be much more productive in such codebase. And humans will be too.

Апокалипсис сегодня на базе LLM

Сегодня снова рандомный мыслепоток на великом и могучем. Есть некоторое количество антиутопической фантастики, выстроенной на появлении машины, способной к осознанию себя. И конечно же это приводит к стойкому желанию разделаться с треклятыми создателями. Из наиболее известных произведений это в частности кинофильмы из серий “Терминатор” и “Матрица”. Современное развитие больших языковых моделей ака БЯМ ака LLM…

Code Modularity: Practical Tips

Why Everyone Is Falling for the AI Pitch

So it’s time for probably the most useless article on this blog ))) It’s been more than two months since the last article, and I can’t get my … stuff together to write another one. Why everyone is falling for the AI pitch? Let’s vent a little for a change. Everyone is a Celebrity Today’s…

Halting problem в 3-х ипостасях

Термин halting problem обычно вспоминают как «ту самую» теоретическую границу вычислений: нельзя написать программу, которая для любой другой программы и любого входа заранее решит, завершится ли вычисление или уйдёт в бесконечный цикл. Но если посмотреть на индустриальные системы — от микросервисов до агентных пайплайнов с LLM — то «проблема остановки» неожиданно перестаёт быть абстракцией. Она…

NFR Conflicts: 3 practical cases

Over time I noticed some typical contradictions between non-functional requirements (NFR). Let’s consider three typical cases of NFR conflicts I learned from practice. Security vs Ease of Use Security is a huge and ever growing concern these days. No surprise here – we live when IT-systems are unprecedentedly dependable. More and more activities occur in…