Scaling Systems: Lessons Learned

18 September 2024 — Dzah Solomon

From 100 to 10 Million Users

When one of our fintech clients crossed 10 million monthly active users, almost everything we'd built for the MVP started to buckle. This post documents what broke, how we fixed it, and the architectural principles we extracted from the experience.

"Premature optimisation is the root of all evil — but so is ignoring scale entirely." — Knuth (adapted)

What Actually Breaks at Scale

Most system failures aren't caused by one big thing. They accumulate. Here's what we consistently see:

Unindexed queries — a SELECT that ran in 4 ms at 10k rows takes 4 s at 10M
N+1 problems — rendering a list that fires one DB query per item
Synchronous external API calls in the critical path
Missing rate limiting — a single misbehaving client can take down the whole service
Monolith coupling — a deployment to update a minor feature causes a full restart

The N+1 Problem in Practice

// ❌ Fires 1 + N queries
const users = await db.user.findMany();
for (const user of users) {
  user.orders = await db.order.findMany({ where: { userId: user.id } });
}

// ✅ Single JOIN via Prisma's include
const users = await db.user.findMany({
  include: { orders: true },
});

Database Strategies

Indexing

Every column that appears in a WHERE, JOIN ON, or ORDER BY clause should be evaluated for indexing.

-- Slow: full table scan on created_at
EXPLAIN SELECT * FROM transactions WHERE created_at > NOW() - INTERVAL '7 days';

-- After adding index:
CREATE INDEX idx_transactions_created_at ON transactions(created_at DESC);

Connection Pooling

At scale, opening a new Postgres connection per request is expensive. We use PgBouncer in transaction mode:

Mode	Description	Use case
Session	One client → one server connection	Long-lived clients
Transaction	Pool across transactions	Most web apps
Statement	Pool across statements	Read-heavy, no multi-statement tx

Read Replicas

Write to the primary, read from replicas. This doubles your read throughput with a single replica and scales linearly beyond that.

const primaryDb = new PrismaClient({ datasourceUrl: process.env.DATABASE_URL });
const readDb = new PrismaClient({
  datasourceUrl: process.env.DATABASE_REPLICA_URL,
});

// Write
await primaryDb.order.create({ data: orderData });

// Read — potentially 10ms cheaper
const orders = await readDb.order.findMany({ where: { userId } });

Queue-Based Architecture

Moving work off the request path is the single highest-leverage scaling technique we've applied. Any task that takes longer than ~200 ms and doesn't need to return a result synchronously belongs in a queue.

What We Queue

Email delivery
PDF generation
Webhook dispatches
Fraud scoring
Data export jobs

Our Stack

We use BullMQ (Redis-backed) for all queued work:

import { Queue, Worker } from "bullmq";

const emailQueue = new Queue("email", { connection: redis });

// Producer (in API handler)
await emailQueue.add("send-welcome", { userId, template: "welcome" });

// Consumer (separate worker process)
const worker = new Worker(
  "email",
  async (job) => {
    await sendEmail(job.data);
  },
  { connection: redis, concurrency: 10 },
);

Horizontal Scaling Checklist

Before you spin up more instances, make sure you've done the following:

Externalize all session state (no in-memory sessions)
Use sticky-session-free load balancing
Move file uploads to object storage (S3 / R2) — never to disk
Centralise logging (no local log files)
Ensure DB migrations are backward-compatible (blue/green deployments)
Set resource limits on every container (memory, cpu)

Observability

You cannot fix what you cannot see. Before scaling, instrument:

Structured JSON logs — parseable by Datadog / Loki
Distributed traces — OpenTelemetry spans across service boundaries
Business metrics — not just p99 latency, but order success rate, fraud rate, etc.

import { trace } from "@opentelemetry/api";

const tracer = trace.getTracer("checkout-service");

export async function processPayment(orderId: string) {
  return tracer.startActiveSpan("processPayment", async (span) => {
    try {
      span.setAttribute("orderId", orderId);
      const result = await chargeCard(orderId);
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (err) {
      span.recordException(err as Error);
      span.setStatus({ code: SpanStatusCode.ERROR });
      throw err;
    } finally {
      span.end();
    }
  });
}

Summary

Scaling isn't a single event — it's an ongoing practice. The teams that succeed at it do three things well:

Understand their bottlenecks before optimising
Decouple work from the request path wherever possible
Measure continuously so regressions surface before users notice

Want to stress-test your architecture? Talk to our team

Engineering Notes