Per-user rate limits and a real readiness probe on the AI backend
ReleasedMarina Köhler
Two operational improvements to the AI backend that you probably will not notice individually but should appreciate cumulatively:
Per-user rate limits. Policy generation is the most expensive endpoint we run — each request can chew through a few seconds of LLM time and a non-trivial amount of compute. A single enthusiastic tenant could previously trigger enough generations to slow things down for everyone else. The backend now enforces a per-user limit so one tenant's workload cannot starve the rest.
A real readiness probe. The backend now exposes a probe that confirms the database connection is alive before reporting "healthy" to the load balancer. Previously the service would happily accept traffic during a database hiccup and return 500s; now traffic stays paused until the dependencies are actually ready.
Together, these mean fewer noisy errors when something upstream is having a bad minute, and more predictable latency under load.
April 29, 2026
Activity feed
No comments yet, be the first to comment!