Blog

Writing

Life, tech, and everything in-between.

SynthID: How Google Watermarks AI Content

May 20, 202613 minAI

Google has watermarked over 10 billion pieces of AI content with SynthID. The text version is open source, the algorithm is genuinely clever, the detector portal is free for journalists. It is also not a silver bullet, and the academic literature has been clear about why.

Choosing the Right Agentic Pattern: A Decision Framework

May 15, 20267 minAI

Thirty-three articles into this series, the question becomes how to pick among the patterns rather than how to build any one of them. A small set of decision trees and a simplicity test cover most cases.

Semantic vs Agentic RAG

May 15, 20268 minAI

Semantic RAG costs roughly $0.001 to $0.01 per query. Agentic RAG costs $0.02 to $0.10 and can be 5 to 50 times slower. The accuracy delta on multi-hop questions is also real. A short guide to picking the right default.

Agentic RAG

May 13, 20268 minAI

Semantic RAG is a fixed pipeline. Agentic RAG hands retrieval control to the model: when to retrieve, what to query, how to combine, when to stop. A short guide to the patterns that matter and how to ship them.

MCP and A2A: Protocol Standards for LLM Agents

May 13, 20267 minAI

For a decade, every tool integration and every inter-agent communication was bespoke. MCP standardizes the data side: any client connects to any tool server. A2A standardizes the agent side: agents discover and call each other across organizations.

Scaling and Cost Optimization for LLM Agentic Systems

May 11, 20267 minAI

Multi-agent architectures amplify both capability and error. Google DeepMind measured the amplification across 180 configurations. Anthropic reports 90 percent cost reductions from a planner-plus-workers split. Both results inform how to scale production agents.

Semantic RAG

May 11, 20266 minAI

Semantic RAG is a fixed pipeline: chunk, embed, search, generate. Tuned well, it handles most production lookup workloads at a cost the alternatives cannot match. A short walkthrough of the pieces and the upgrades that matter.

Harness Design: Planner, Generator, Evaluator for Production LLM Agents

May 8, 20268 minAI

A harness is the set of components around a language model that turn a research prototype into a production system. Anthropic's three-agent architecture (planner, generator, evaluator) mirrors the GAN discriminator-generator dynamic and survives long-running applications.

Thin SDK, Fat Runtime: Owning the Agent Loop in Pure Python

May 8, 202615 minAI

A provider's SDK earns its keep on six concerns: auth, request shape, transport, retries, streaming reassembly, and response parsing. Everything above that, the agent loop, planning, routing, memory, parallelism, guardrails, and workflow state, is application architecture and is better written in pure Python.

Agent Evaluation: LLM-as-Judge, Pass-at-K, and Benchmarks

May 6, 20268 minAI

An agent that cannot be measured cannot be improved. Evaluation starts with twenty queries and an LLM-as-judge, scales up through pass-at-k metrics and standard benchmarks, and never trusts any single layer alone.

SubQ and the Indexer Problem

May 6, 202610 minAI

Subquadratic launched yesterday with a 12-million-token model and a $29M seed. The architecture pitch rests on a selector that has to be subquadratic in sequence length. The launch materials describe the property without describing the mechanism.

One Word, Three Jobs: What "Agent" Means at Lunch and in Production

May 5, 202610 minAI

The lunch-table conversation about AI agents and the production-engineering conversation about AI agents are both right. They are talking about three different things and calling all of them "agent."