Writing
RSSHow I Use Claude Code
How I shipped 1,000+ PRs in 2025 with parallel agents and cross-model audits.
How AI Regulation Changed in 2025
Why "AI compliance questions" appeared in security questionnaires and RFPs, and how policy becomes contract requirements.
Why Attack Success Rate (ASR) Isn't Comparable Across Jailbreak Papers
ASR isn't portable across papers because measurement choices dominate the headline number. Includes math and a checklist for reading papers.
GPT-5.2 Initial Trust and Safety Assessment
Day-zero red team of GPT-5.2 focusing on jailbreak resilience and harmful content.
Real-Time Fact Checking for LLM Outputs
Introduces search-rubric, an assertion where a search-enabled judge verifies time-sensitive claims during evals and CI.
When AI becomes the attacker: The rise of AI-orchestrated cyberattacks
Connects malware querying LLMs at runtime with "vibe hacking" case studies. Defense needs continuous testing.
Reinforcement Learning with Verifiable Rewards Makes Models Faster, Not Smarter
RLVR gains are often "search compression" rather than new reasoning ability.
Prompt Injection vs Jailbreaking: What's the Difference?
Jailbreaking targets model safety training; prompt injection targets application trust boundaries.
AI Safety vs AI Security in LLM Applications: What Teams Must Know
Safety protects people from harmful outputs; security protects systems from adversarial manipulation.
Evaluating political bias in LLMs
Open methodology and dataset (2,500 political statements) to measure political leaning in models.
Testing Humanity's Last Exam with Promptfoo
Guide on using Promptfoo to test the HLE benchmark.