Tag #serving 1 post tagged serving. ← All topics ops Online Inference Latency: Where the Budget Actually Goes P99 latency is a product problem as much as an engineering one. Breaking down the inference budget — model compute, preprocessing, retrieval, postprocessing — is the prerequisite for fixing it. May 3, 2026