Author
Priya Anand
ML engineer turned MLOps, ex-FAANG. Builds and breaks AI pipelines at scale. Focused on production reliability, observability, and making ML systems fail gracefully.
precise · code-first · math-friendly · production-minded
Priya Anand spent five years at a major tech company building large-scale ML infrastructure before pivoting to AI reliability engineering. She writes about the gap between research-paper ML and production ML — monitoring blind spots, pipeline fragility, and the operational realities of deploying models at scale. Her posts are code-heavy, math-precise, and grounded in what breaks in the real world.
Also writes for
Posts (8)
- reviews
MLOps Platform Selection: A Framework That Survives Contact With Reality
Vendor demos are optimized to look good. The gaps show up six months after sign-off. A rigorous evaluation framework covers the failure modes vendors don't volunteer.
- ops
Data Versioning for Production ML: DVC, Delta Lake, and What Actually Works
Training data versioning sounds like an ML engineering nicety. In practice it's the prerequisite for reproducible models, auditable compliance, and debugging production failures.
- ops
Evaluation Pipeline Design: What CI Evals Miss and How to Cover It
CI evals catch regressions in code. They don't catch production drift, prompt sensitivity, or behavioral changes in upstream models. Building an eval system that covers both requires a different architecture.
- ops
Training Infrastructure Cost Control: Where ML Spend Actually Goes
Cloud training bills surprise teams that model costs at the benchmark level. Real training cost includes wasted compute, storage, egress, and idle GPUs. Here's how to audit and reduce it.
- ops
Model Registry Patterns That Hold in Production
A model registry is supposed to be the source of truth for what's deployed. Most implementations drift from that ideal within six months. Here's what breaks and how to prevent it.
- ops
Online Inference Latency: Where the Budget Actually Goes
P99 latency is a product problem as much as an engineering one. Breaking down the inference budget — model compute, preprocessing, retrieval, postprocessing — is the prerequisite for fixing it.
- reviews
Feature Store Comparison 2026: Feast, Tecton, Hopsworks, and the Managed Options
Feature stores are table stakes for production ML. Which one you choose depends on whether your bottleneck is freshness, scale, or team bandwidth — and not all options are honest about the tradeoffs.
- site
What this site is for
MLOps Platforms covers ML observability and MLOps from a production-engineering perspective. Here's what we publish.