Temur Khan
  • Home
  • Tools
  • Writing
  • Hire Me
  • About
Categories
All (7)
agents (4)
architecture (2)
claude-code (1)
code-review (1)
error-handling (3)
mcp (2)
monitoring (1)
observability (1)
production-ai (7)
reliability (4)
security (1)
supply-chain (1)
workflows (1)

Writing

Long-form essays on production AI failure patterns, agent reliability, and operational discipline.

Essays on production-AI failure patterns. New posts roughly monthly.

Subscribe via RSS — no newsletter signup yet.

When You Bound an Operation, the Bound Is a Typed Error

production-ai
reliability
error-handling
code-review

Adding a timeout or a cap to a previously-unbounded operation is the easy part. The part teams get wrong is how the bound-exceeded condition surfaces. It is not a return value and it is not a generic Error. It is a typed error, and every bound in a module should use the same one.

2026-05-21
5 min

Your Agent’s Config File Is Executable. Treat It Like One.

production-ai
security
agents
supply-chain

Every coding-agent runtime loads files like AGENTS.md, .cursor/rules, CLAUDE.md, and .git/hooks with full operator-level trust. Those files live in repos anyone can PR to. They get copy-pasted from blog posts. They are not comments. They are code with a wider blast radius than your dependencies.

2026-05-20
4 min

Your Event Loop Already Knows It’s Starving. You’re Just Not Listening.

production-ai
reliability
observability
architecture

Event-loop lag is a measurable, first-class signal. Most systems capture it only incidentally on one code path, so every other timeout during a starvation window gets misattributed to the wrong subsystem. The fix isn’t less load. It’s listening to the signal that already exists.

2026-05-18
5 min

Three Bugs, One Shape: The Failure Signal That Existed and Got Discarded

production-ai
reliability
error-handling
architecture

Across a week of production-AI bug reports, three landed in completely different subsystems and turned out to be the same bug wearing three costumes. The shape: a structured failure signal that existed at one layer, got discarded before anything used it, and left a success record that disagreed with reality.

2026-05-17
8 min

Rate-Limit Detection Belongs at the Header Layer, Not the Error String

production-ai
reliability
error-handling
agents

Parsing a 429 out of an error message is reading at the wrong layer. The structured signal was sitting in the response headers, before the regex ran and before the idle timer won the race.

2026-05-16
4 min

5 Claude Code Patterns That Separate Power Users From Everyone Else

claude-code
agents
production-ai
workflows
mcp

Five patterns I see in Claude Code workflows that actually ship and scale past the first month: skill-driven workflow forks, PreToolUse hooks, subagent task delegation, headless mode + scheduled cron firings, and per-repo settings.json with command-level allowlists.

2026-05-13
4 min

5 Silent-Failure Patterns I Keep Finding in Production AI Systems

production-ai
monitoring
agents
mcp

Production AI systems fail in ways traditional monitoring can’t catch. Here’s the catalog of 5 patterns that keep appearing across every stack — and what to check for.

2026-05-07
12 min
No matching items

Temur Khan · temhan.dev

GitHub · X · Bundle

hello@temhan.dev