The Dev Insights
The Dev Insights
My backend was slow then Fil-C saved us
Back to Blog
Data ScienceNovember 2, 202510 min read

My backend was slow then Fil-C saved us

Data ScienceNovember 2, 202510 min read

Our game server backend was choking under load, burning memory and CPU. Standard malloc wasn't cutting it, so we dug into Fil-C.

You know that feeling when your high-performance backend starts getting sluggish for no clear reason? That was me a while back, working on the session and leader board stuff for a popular online game. We had thousands of concurrent gamers all updating scores and chatting, and it was all good until we hit about 10k active users. Then, random latency spikes started appearing. Our API response times usually sat around 80ms, but then they'd shoot up to 500ms, sometimes even a full second. It was a real pain, especially for steam users who just want things to work smoothly.

What we tried first (that didn't work)

We were on linux servers, with a standard C++ backend. Node.js 20.9.0 handled the API gateway and some lighter bits. My first thought? The database was probably getting hammered. I checked our PostgreSQL queries, and nope, they were still super fast, like 15-30ms. No smoking gun there. Next, I looked at the network, thinking maybe cross-region latency was the problem. But everything was right next to each other, so that wasn't it.

Then came the real deep dive: into the C++ service. We used perf to check CPU usage. And what did I see? A surprising amount of time spent inside malloc and free. Seriously, 15-20% of our CPU time was stuck in these functions when things got busy. This screamed memory allocation problems or fragmentation. Our service was constantly grabbing and releasing tiny bits of memory for user sessions, game events, and leader board updates. Basically, malloc was slowing us down. Our memory usage was also creeping up. It wasn't a leak, but it was expanding, which meant memory was getting fragmented and not properly given back to the OS.

My tech lead even joked during our sprint review, "Are our codebases just dependency collections?" We were talking about how many layers of libraries were all using malloc behind the scenes. It really made me realise how little control we usually have over these low-level things.

What actually worked: Diving into Fil-C

I remembered seeing something about Fil-C by djb on HackerNews ages ago. Fil-C is basically a swap-in for malloc and free. It's by Daniel J. Bernstein (djb is known for super simple, solid code). Its whole idea is simplicity, speed, and trying not to use global locks. It grabs big chunks of memory from the OS and then handles smaller allocations inside those chunks with a really simple, fast method. This really cuts down on contention and fragmentation.

How-to: Integrating Fil-C into your C/C++ project

This isn't a silver bullet for every project. But if malloc is slowing down your high-performance C or C++ app on linux, it's definitely worth checking out. Here's how we set it up, step-by-step.

1. Prerequisites

* Operating System: You'll need a linux environment. Fil-C is made for Unix-like systems. We used Ubuntu 22.04 LTS.

* C Compiler: GCC (version 11.4.0) or Clang.

* Basic C/C++ knowledge: You should know your way around compiling C code and linking libraries.

* Profiling Tools: perf, valgrind, gdb – these are super important for checking and fixing things.

2. Tools or requirements needed

* Fil-C source code: You can usually find this on djb's website or other mirrored sites. I grabbed mine from a trusted mirror that also had his qmail and dnscache projects. Just be careful where you get it from if it's not official.

* Make: For compiling.

3. Step-by-step instructions

Honestly, this whole thing, from downloading to getting it working the first time, took me about 3-4 hours. And yes, there was some head-scratching involved!

Step 3.1: Get the Fil-C Source (approx. 15 minutes)

bash
# Navigate to your project's `deps` or `vendor` directory
cd my-project/deps
wget http://cr.yp.to/lib/fil-20070119.tar.gz
tar -xzf fil-20070119.tar.gz
mv fil-20070119 fil-c # Rename for clarity

Step 3.2: Compile Fil-C (approx. 5 minutes)

Fil-C is tiny, so it compiles super fast.

bash
cd my-project/deps/fil-c
# Create a simple Makefile if one isn't present, or use `gcc` directly
gcc -c fil.c -o fil.o -Wall -O2 -fPIC
gcc -shared -o libfil.so fil.o # Create a shared library

Step 3.3: Integrate into your project (Option 1: LD_PRELOAD) (approx. 10 minutes)

This is the fastest way to try it out without changing any of your actual code. It basically tells the linker to load libfil.so before the standard malloc from libc.

bash
# Assuming libfil.so is in your current directory, or specify full path
LD_PRELOAD=./libfil.so ./your_application

Step 3.4: Integrate into your project (Option 2: Direct linking/code modification) (approx. 1 hour)

For something more solid, you might want to link it directly or wrap it. This is what we ended up doing for parts of our code that allocated memory a lot.

c
// In a header file (e.g., my_allocator.h)
#ifdef USE_FIL_ALLOC
#include "fil.h" // Assuming you've copied fil.h to an include path
#define my_malloc Fil_alloc
#define my_free Fil_free
#else
#include <stdlib.h>
#define my_malloc malloc
#define my_free free
#endif

// In your C/C++ code
void* data = my_malloc(1024);
// ... use data ...
my_free(data);

Then, compile and link against libfil.so:

bash
g++ -DUSE_FIL_ALLOC -o your_application main.cpp -L./my-project/deps/fil-c -lfil -Wl,--rpath=./my-project/deps/fil-c

Step 3.5: Test and Profile (ongoing, 1-2 hours initially)

This is where the real work happens. Run your app under heavy load, both with and without Fil-C. Compare perf output. Check for changes in how many times malloc/free are called and how much time they take.

bash
# Profile without Fil-C
perf record -g ./your_application
perf report

# Profile with Fil-C (using LD_PRELOAD for simplicity here)
LD_PRELOAD=./my-project/deps/fil-c/libfil.so perf record -g ./your_application
perf report

Look for the [kernel.kallsyms] section in perf report. You should see malloc and free either gone or way less frequent in the samples when Fil-C is on. When our API hit 100k requests a day, perf showed malloc dropped from 18% to under 2% of CPU time. That was our big win!

Code Examples: Real patterns we use

Beyond just swapping out malloc, we also started setting up small, specific arena allocators for really busy parts of our code. This means we don't use the global allocator at all for certain objects during a request.

```c

// Example: Simple arena for session objects within a request handler

struct SessionAllocator {

char* buffer;

size_t offset;

size_t capacity;

};

void SessionAllocator_init(struct SessionAllocator* sa, size_t cap) {

sa->buffer = (char*)Fil_alloc(cap); // Use Fil_alloc for the large block

sa->offset = 0;

sa->capacity = cap;

}

void SessionAllocator_alloc(struct SessionAllocator sa, size_t size) {

if (sa->offset + size > sa->capacity) {

return NULL; // Or grow, or throw error

}

void* ptr = sa->buffer + sa->offset;

sa->offset += size;

return ptr;

}

void SessionAllocator_destroy(struct Session

TOPICS

Data Science

Share This Article

Related Articles

Archives

November 202516
October 202559
T

The Dev Insights Team

Author