Blog

Writing

Life, tech, and everything in-between.

SynthID: How Google Watermarks AI Content

13 minAI

Google has watermarked over 10 billion pieces of AI content with SynthID. The text version is open source, the algorithm is genuinely clever, the detector portal is free for journalists. It is also not a silver bullet, and the academic literature has been clear about why.

Choosing the Right Agentic Pattern: A Decision Framework

7 minAI

Thirty-three articles into this series, the question becomes how to pick among the patterns rather than how to build any one of them. A small set of decision trees and a simplicity test cover most cases.

Semantic vs Agentic RAG

8 minAI

Semantic RAG costs roughly $0.001 to $0.01 per query. Agentic RAG costs $0.02 to $0.10 and can be 5 to 50 times slower. The accuracy delta on multi-hop questions is also real. A short guide to picking the right default.

Agentic RAG

8 minAI

Semantic RAG is a fixed pipeline. Agentic RAG hands retrieval control to the model: when to retrieve, what to query, how to combine, when to stop. A short guide to the patterns that matter and how to ship them.

MCP and A2A: Protocol Standards for LLM Agents

7 minAI

For a decade, every tool integration and every inter-agent communication was bespoke. MCP standardizes the data side: any client connects to any tool server. A2A standardizes the agent side: agents discover and call each other across organizations.

Scaling and Cost Optimization for LLM Agentic Systems

7 minAI

Multi-agent architectures amplify both capability and error. Google DeepMind measured the amplification across 180 configurations. Anthropic reports 90 percent cost reductions from a planner-plus-workers split. Both results inform how to scale production agents.

Semantic RAG

6 minAI

Semantic RAG is a fixed pipeline: chunk, embed, search, generate. Tuned well, it handles most production lookup workloads at a cost the alternatives cannot match. A short walkthrough of the pieces and the upgrades that matter.

Harness Design: Planner, Generator, Evaluator for Production LLM Agents

8 minAI

A harness is the set of components around a language model that turn a research prototype into a production system. Anthropic's three-agent architecture (planner, generator, evaluator) mirrors the GAN discriminator-generator dynamic and survives long-running applications.

Thin SDK, Fat Runtime: Owning the Agent Loop in Pure Python

15 minAI

A provider's SDK earns its keep on six concerns: auth, request shape, transport, retries, streaming reassembly, and response parsing. Everything above that, the agent loop, planning, routing, memory, parallelism, guardrails, and workflow state, is application architecture and is better written in pure Python.

Agent Evaluation: LLM-as-Judge, Pass-at-K, and Benchmarks

8 minAI

An agent that cannot be measured cannot be improved. Evaluation starts with twenty queries and an LLM-as-judge, scales up through pass-at-k metrics and standard benchmarks, and never trusts any single layer alone.

SubQ and the Indexer Problem

10 minAI

Subquadratic launched yesterday with a 12-million-token model and a $29M seed. The architecture pitch rests on a selector that has to be subquadratic in sequence length. The launch materials describe the property without describing the mechanism.

One Word, Three Jobs: What "Agent" Means at Lunch and in Production

10 minAI

The lunch-table conversation about AI agents and the production-engineering conversation about AI agents are both right. They are talking about three different things and calling all of them "agent."