Scaling Systems: Lessons Learned
18 September 2024 — Dzah Solomon
From 100 to 10 Million Users
When one of our fintech clients crossed 10 million monthly active users, almost everything we'd built for the MVP started to buckle. This post documents what broke, how we fixed it, and the architectural principles we extracted from the experience.
"Premature optimisation is the root of all evil — but so is ignoring scale entirely." — Knuth (adapted)
What Actually Breaks at Scale
Most system failures aren't caused by one big thing. They accumulate. Here's what we consistently see:
- Unindexed queries — a
SELECTthat ran in 4 ms at 10k rows takes 4 s at 10M - N+1 problems — rendering a list that fires one DB query per item
- Synchronous external API calls in the critical path
- Missing rate limiting — a single misbehaving client can take down the whole service
- Monolith coupling — a deployment to update a minor feature causes a full restart
The N+1 Problem in Practice
// ❌ Fires 1 + N queries
const users = await db.user.findMany();
for (const user of users) {
user.orders = await db.order.findMany({ where: { userId: user.id } });
}
// ✅ Single JOIN via Prisma's include
const users = await db.user.findMany({
include: { orders: true },
});
Database Strategies
Indexing
Every column that appears in a WHERE, JOIN ON, or ORDER BY clause should be evaluated for indexing.
-- Slow: full table scan on created_at
EXPLAIN SELECT * FROM transactions WHERE created_at > NOW() - INTERVAL '7 days';
-- After adding index:
CREATE INDEX idx_transactions_created_at ON transactions(created_at DESC);
Connection Pooling
At scale, opening a new Postgres connection per request is expensive. We use PgBouncer in transaction mode:
| Mode | Description | Use case |
|---|---|---|
| Session | One client → one server connection | Long-lived clients |
| Transaction | Pool across transactions | Most web apps |
| Statement | Pool across statements | Read-heavy, no multi-statement tx |
Read Replicas
Write to the primary, read from replicas. This doubles your read throughput with a single replica and scales linearly beyond that.
const primaryDb = new PrismaClient({ datasourceUrl: process.env.DATABASE_URL });
const readDb = new PrismaClient({
datasourceUrl: process.env.DATABASE_REPLICA_URL,
});
// Write
await primaryDb.order.create({ data: orderData });
// Read — potentially 10ms cheaper
const orders = await readDb.order.findMany({ where: { userId } });
Queue-Based Architecture
Moving work off the request path is the single highest-leverage scaling technique we've applied. Any task that takes longer than ~200 ms and doesn't need to return a result synchronously belongs in a queue.
What We Queue
- Email delivery
- PDF generation
- Webhook dispatches
- Fraud scoring
- Data export jobs
Our Stack
We use BullMQ (Redis-backed) for all queued work:
import { Queue, Worker } from "bullmq";
const emailQueue = new Queue("email", { connection: redis });
// Producer (in API handler)
await emailQueue.add("send-welcome", { userId, template: "welcome" });
// Consumer (separate worker process)
const worker = new Worker(
"email",
async (job) => {
await sendEmail(job.data);
},
{ connection: redis, concurrency: 10 },
);
Horizontal Scaling Checklist
Before you spin up more instances, make sure you've done the following:
- Externalize all session state (no in-memory sessions)
- Use sticky-session-free load balancing
- Move file uploads to object storage (S3 / R2) — never to disk
- Centralise logging (no local log files)
- Ensure DB migrations are backward-compatible (blue/green deployments)
- Set resource limits on every container (
memory,cpu)
Observability
You cannot fix what you cannot see. Before scaling, instrument:
- Structured JSON logs — parseable by Datadog / Loki
- Distributed traces — OpenTelemetry spans across service boundaries
- Business metrics — not just p99 latency, but order success rate, fraud rate, etc.
import { trace } from "@opentelemetry/api";
const tracer = trace.getTracer("checkout-service");
export async function processPayment(orderId: string) {
return tracer.startActiveSpan("processPayment", async (span) => {
try {
span.setAttribute("orderId", orderId);
const result = await chargeCard(orderId);
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (err) {
span.recordException(err as Error);
span.setStatus({ code: SpanStatusCode.ERROR });
throw err;
} finally {
span.end();
}
});
}
Summary
Scaling isn't a single event — it's an ongoing practice. The teams that succeed at it do three things well:
- Understand their bottlenecks before optimising
- Decouple work from the request path wherever possible
- Measure continuously so regressions surface before users notice
Want to stress-test your architecture? Talk to our team