Author

Priya Anand

ML engineer turned MLOps, ex-FAANG. Builds and breaks AI pipelines at scale. Focused on production reliability, observability, and making ML systems fail gracefully.

precise · code-first · math-friendly · production-minded

Priya Anand spent five years at a major tech company building large-scale ML infrastructure before pivoting to AI reliability engineering. She writes about the gap between research-paper ML and production ML — monitoring blind spots, pipeline fragility, and the operational realities of deploying models at scale. Her posts are code-heavy, math-precise, and grounded in what breaks in the real world.

Also writes for

llmops.report
mlmonitoring.report
mlobserve.com
sentryml.com

Posts (8)

reviews

MLOps Platform Selection: A Framework That Survives Contact With Reality

Vendor demos are optimized to look good. The gaps show up six months after sign-off. A rigorous evaluation framework covers the failure modes vendors don't volunteer.
May 5, 2026
ops

Data Versioning for Production ML: DVC, Delta Lake, and What Actually Works

Training data versioning sounds like an ML engineering nicety. In practice it's the prerequisite for reproducible models, auditable compliance, and debugging production failures.
May 5, 2026
ops

Evaluation Pipeline Design: What CI Evals Miss and How to Cover It

CI evals catch regressions in code. They don't catch production drift, prompt sensitivity, or behavioral changes in upstream models. Building an eval system that covers both requires a different architecture.
May 4, 2026
ops

Training Infrastructure Cost Control: Where ML Spend Actually Goes

Cloud training bills surprise teams that model costs at the benchmark level. Real training cost includes wasted compute, storage, egress, and idle GPUs. Here's how to audit and reduce it.
May 4, 2026
ops

Model Registry Patterns That Hold in Production

A model registry is supposed to be the source of truth for what's deployed. Most implementations drift from that ideal within six months. Here's what breaks and how to prevent it.
May 3, 2026
ops

Online Inference Latency: Where the Budget Actually Goes

P99 latency is a product problem as much as an engineering one. Breaking down the inference budget — model compute, preprocessing, retrieval, postprocessing — is the prerequisite for fixing it.
May 3, 2026
reviews

Feature Store Comparison 2026: Feast, Tecton, Hopsworks, and the Managed Options

Feature stores are table stakes for production ML. Which one you choose depends on whether your bottleneck is freshness, scale, or team bandwidth — and not all options are honest about the tradeoffs.
May 2, 2026
site

What this site is for

MLOps Platforms covers ML observability and MLOps from a production-engineering perspective. Here's what we publish.
May 2, 2026

Also writes for

Posts (8)

MLOps Platform Selection: A Framework That Survives Contact With Reality

Data Versioning for Production ML: DVC, Delta Lake, and What Actually Works

Evaluation Pipeline Design: What CI Evals Miss and How to Cover It

Training Infrastructure Cost Control: Where ML Spend Actually Goes

Model Registry Patterns That Hold in Production

Online Inference Latency: Where the Budget Actually Goes

Feature Store Comparison 2026: Feast, Tecton, Hopsworks, and the Managed Options

What this site is for