Per-user rate limits and a real readiness probe on the AI backen

Two operational improvements to the AI backend that you probably will not notice individually but should appreciate cumulatively:

Per-user rate limits. Policy generation is the most expensive endpoint we run — each request can chew through a few seconds of LLM time and a non-trivial amount of compute. A single enthusiastic tenant could previously trigger enough generations to slow things down for everyone else. The backend now enforces a per-user limit so one tenant's workload cannot starve the rest.

A real readiness probe. The backend now exposes a probe that confirms the database connection is alive before reporting "healthy" to the load balancer. Previously the service would happily accept traffic during a database hiccup and return 500s; now traffic stays paused until the dependencies are actually ready.

Together, these mean fewer noisy errors when something upstream is having a bad minute, and more predictable latency under load.

April 29, 2026

Per-user rate limits and a real readiness probe on the AI backend