Distributed Systems · Incident Response · 2026

Vellum

Distributed signal & incident-response system. 10k signals/sec at p99 < 2ms over HTTP + gRPC, atomic Redis Lua debounce gives 100× alert-noise reduction, state-machine lifecycle on Postgres SERIALIZABLE transactions.

Redis
Golang
gRPC
PostgreSQL
MongoDB
TimescaleDB
Next.js
Docker Compose

▸ Demo

The problem

Incident triage tools either eat events and emit dashboards, or emit alerts and forget context. Vellum is an attempt to do both - high ingest, low-latency dispatch, and a durable history you can replay months later.

Architecture

A full write-up is in progress. Until then, the headline:

Hot path (Redis): the last 24h of signals, indexed by service and correlation ID. Lookup latency is the dispatch latency.
Cold path (PostgreSQL): every signal ever, partitioned by week, with retention controlled by tag.
Producer ingress (Kafka): so a slow consumer can never back-pressure a fast producer.
Dispatch (Go + gRPC): the routing rules run as a deterministic state machine, no surprises across replicas.

What's coming

The full case study will cover the p99 budget breakdown, how the saga compensation works under partial failure, and the OpenTelemetry trace that helped me find the actual bottleneck (it was none of the suspected things).

For now, treat this page as a marker.