Insights

Writing on AI, data & engineering

Browser AI

Gemini Nano in Chrome: The Prompt API and On-Device General-Purpose LLM

Chrome's Prompt API exposes Gemini Nano — stable in Chrome 148 — for free-form generation, structured JSON output, multimodal inference, and multi-turn sessions. A technical deep-dive with production patterns and live demos.

May 2026 20 min read
AI Brand Intelligence

LLM Brand Monitoring Dashboards: What Profound and Competitors Miss, and a Better Framework

How Profound, Otterly, Peec, and Goodie track AI Share of Voice — the metric gaps that matter (citation prominence, topic-cluster SOV, platform-level deltas) — plus an interactive demo dashboard with mock data built in pure HTML/CSS/JS.

June 2026 8 min read
AI Engineering

Claude Fable 5: Benchmarks, Pricing, API Guide, and How It Differs from Claude Mythos 5

Mythos-class performance (SWE-Bench Pro 80.3%) now publicly available. Adaptive thinking always on, $10/$50 pricing, safety classifiers routed to Opus 4.8, and the June 22 deadline for free access on Pro/Max/Team plans.

June 2026 14 min read
AI Engineering

How to Use Claude Code Workflows in Python: AGENTS.md, Ultracode CLI, and AsyncAnthropic Orchestration

Two complete implementation paths: Claude Code CLI with ultracode for open-ended codebase tasks, and a Python AsyncWorkflowOrchestrator with a token kill switch, Semaphore rate limiting, and compaction handling — applied to a data pipeline code review use case.

May 2026 16 min read
AI Engineering

Claude Opus 4.8 Dynamic Workflows: How Ultracode Works, What It Costs, and How to Control the Spend

Benchmarks verified, pricing table corrected, and six production patterns explained: effort levels, adaptive thinking, 1,024-token caching, mid-conversation steering, compaction with pause_after_compaction, and the ultracode architecture with its token explosion risk.

May 2026 18 min read
AI Engineering

Cursor Agent Pipeline: How to Set Up Worktrees, Write Effective Instructions, and Measure Real ROI

A practical guide for engineering leads: parallel worktrees for up to 8 agents, AGENTS.md templates that actually change behavior, a conductor prompt for selecting the best diff, and the six failure modes that kill agentic pipelines.

May 2026 16 min read
AI Engineering

Cursor Model Selection in 2026: When to Use Composer 2.5 Standard, Fast, or Frontier

Composer 2.5 is built on Kimi K2.5, benchmarks within a point of Claude Opus 4.7, and costs one-tenth the Standard-tier price. Corrected pricing facts, routing framework, CI gate code, and the three lessons from Cursor's self-driving experiment.

May 2026 18 min read
Browser AI

Gemma 197M: Chrome's On-Device Task Model — Summarizer, Language Detector, Translator

Chrome ships two separate AI models. Gemma 197M powers the stable task APIs — Summarizer, Language Detector, Translator — with zero token cost and zero server latency. A technical deep-dive with production-ready code and live demos.

May 2026 14 min read
Data Engineering

Database Schema Migration Patterns for LLM-Scale Data Pipelines

The infrastructure debt nobody budgets for: vector dimension changes require full index rebuilds, ClickHouse mutations can't be rolled back, DuckDB allows a single writer. A production guide covering Expand-Contract, Blue-Green, Shadow Writes and the database-specific behaviour that will surprise you.

May 2026 18 min read
Infrastructure & AI

Centralized LLM API Gateway vs. Self-Hosted Models: The 2026 Enterprise Decision

The real question is not API vs. self-hosted — it is about routing. Cost breakdown with verified 2026 pricing, EU data residency gaps across providers, LiteLLM gateway configuration, and the security matrix that determines when self-hosting is actually required.

May 2026 15 min read
AI & Data Engineering

Karpathy's LLM Wiki Pattern for Brand Intelligence: A Production Implementation

The first published implementation of Karpathy's April 2026 LLM Wiki pattern applied to GEO and brand monitoring — with ingest pipeline, SelfCheckGPT-NLI hallucination gating, lint operations, and complete Python code.

May 2026 20 min read
AI & Data Engineering

LLM Agent Memory Architectures in 2026: The Decision Most Enterprise Teams Make Too Late

Claude Code, Mem0, Zep/Graphiti, Letta, MemOS — a verified technical comparison of every major LLM memory architecture, with benchmark data and a decision framework built around governance, not just retrieval.

May 2026 20 min read
AI & Brand Intelligence

Best Practices for Monitoring Brand Sentiment Using LLMs

Most LLM sentiment pipelines are structurally inadequate: single-shot measurements, circular validation, no confidence intervals. A production guide covering prompt engineering, aspect-based analysis, hallucination handling and multi-platform strategy.

May 2026 16 min read
Infrastructure & AI

Self-Hosted LLMs for Enterprise: Real Costs and Trade-offs

The case for self-hosting keeps getting easier to make on slides and harder to execute in production. A cost breakdown for senior engineers: hardware tiers, hidden expenses, Ollama vs vLLM, model licensing, and when the math simply doesn't work.

April 2026 9 min read
AI & Data Engineering

Claude vs the Field: LLMs for Data Engineering in 2026

Which LLM, for what task, at what price? SQL benchmarks, Claude Code + dbt field evidence, MCP integrations, cost routing strategy, GDPR compliance paths, and the open-source challengers closing the gap — grounded in Q1–Q2 2026 data.

April 2026 20 min read
AI Tools

Cursor 3.0 Agentic Architecture: What Actually Changed for Engineering Teams

Cursor 3.0 is not an incremental update — it's a shift from autocomplete to parallel agent execution. Here's a technical breakdown of git worktrees, the Agents Window, /best-of-n, and what it means for how senior engineers actually work.

April 2026 8 min read
AEO & AI Visibility

How to Build an AEO Monitoring Pipeline: a Technical Guide

AEO is a data engineering problem, not a marketing one. How to structure the query set, build the pipeline, fix entity clarity, and close the loop from measurement to action.

April 2026 9 min read
AI Tools

Claude Code in Data Engineering: How I Use AI Agents on Real Enterprise Projects

Not a benchmark post. This is how Claude Code actually fits into a real data engineering workflow — where it saves hours, where it breaks down, and what advanced usage actually looks like.

April 2026 7 min read
AI & Data

LLM Brand Monitoring: The Metrics That Actually Matter for Enterprise

Brand managers are asking "does our brand appear in ChatGPT answers?" — and most tools still can't answer reliably. Here's the architecture we built to do it properly, and the four metrics that tell you whether you're winning in AI search.

April 2026 8 min read