Writing

RSS

How I Use Claude Code

How I shipped 1,000+ PRs in 2025 with parallel agents and cross-model audits.

How AI Regulation Changed in 2025

Why "AI compliance questions" appeared in security questionnaires and RFPs, and how policy becomes contract requirements.

Why Attack Success Rate (ASR) Isn't Comparable Across Jailbreak Papers

ASR isn't portable across papers because measurement choices dominate the headline number. Includes math and a checklist for reading papers.

GPT-5.2 Initial Trust and Safety Assessment

Day-zero red team of GPT-5.2 focusing on jailbreak resilience and harmful content.

Real-Time Fact Checking for LLM Outputs

Introduces search-rubric, an assertion where a search-enabled judge verifies time-sensitive claims during evals and CI.

When AI becomes the attacker: The rise of AI-orchestrated cyberattacks

Connects malware querying LLMs at runtime with "vibe hacking" case studies. Defense needs continuous testing.

Reinforcement Learning with Verifiable Rewards Makes Models Faster, Not Smarter

RLVR gains are often "search compression" rather than new reasoning ability.

Prompt Injection vs Jailbreaking: What's the Difference?

Jailbreaking targets model safety training; prompt injection targets application trust boundaries.

AI Safety vs AI Security in LLM Applications: What Teams Must Know

Safety protects people from harmful outputs; security protects systems from adversarial manipulation.

Evaluating political bias in LLMs

Open methodology and dataset (2,500 political statements) to measure political leaning in models.

Testing Humanity's Last Exam with Promptfoo

Guide on using Promptfoo to test the HLE benchmark.