Research

Project Fetch: Can Claude train a robot dog?	Policy	Nov 12, 2025
Commitments on model deprecation and preservation	Alignment	Nov 4, 2025
Signs of introspection in large language models	Interpretability	Oct 28, 2025
Preparing for AI's economic impact: exploring policy responses	Policy	Oct 14, 2025
A small number of samples can poison LLMs of any size	Alignment	Oct 9, 2025
Petri: An open-source auditing tool to accelerate AI safety research	Alignment	Oct 6, 2025
Building AI for cyber defenders	Policy	Oct 3, 2025
Anthropic Economic Index report: Uneven geographic and enterprise AI adoption	Economic Research	Sep 15, 2025
Anthropic Economic Index: Tracking AI's role in the US and global economy	Economic Research	Sep 15, 2025
Anthropic Education Report: How educators use Claude	Societal Impacts	Aug 26, 2025
Claude Opus 4 and 4.1 can now end a rare subset of conversations	Alignment	Aug 15, 2025
Persona vectors: Monitoring and controlling character traits in language models	Interpretability	Aug 1, 2025
Project Vend: Can Claude run a small shop? (And why does that matter?)	Policy	Jun 27, 2025
Agentic Misalignment: How LLMs could be insider threats	Alignment	Jun 20, 2025
Confidential Inference via Trusted Virtual Machines	Announcements	Jun 18, 2025
SHADE-Arena: Evaluating sabotage and monitoring in LLM agents	Alignment	Jun 16, 2025
Open-sourcing circuit tracing tools	Interpretability	May 29, 2025
Anthropic Economic Index: AI's impact on software development	Societal Impacts	Apr 28, 2025
Exploring model welfare	Alignment	Apr 24, 2025
Values in the wild: Discovering and analyzing values in real-world language model interactions	Societal Impacts	Apr 21, 2025
Anthropic Education Report: How university students use Claude	Announcements · Societal Impacts	Apr 8, 2025
Reasoning models don't always say what they think	Alignment	Apr 3, 2025
Anthropic Economic Index: Insights from Claude 3.7 Sonnet	Announcements · Societal Impacts	Mar 27, 2025
Tracing the thoughts of a large language model	Interpretability	Mar 27, 2025
Auditing language models for hidden objectives	Alignment · Interpretability	Mar 13, 2025
Forecasting rare language model behaviors	Alignment	Feb 25, 2025
Claude's extended thinking	Announcements	Feb 24, 2025
Insights on Crosscoder Model Diffing	Interpretability	Feb 20, 2025
The Anthropic Economic Index	Announcements · Societal Impacts	Feb 10, 2025
Constitutional Classifiers: Defending against universal jailbreaks	Alignment	Feb 3, 2025
Building effective agents	Product	Dec 19, 2024
Alignment faking in large language models	Alignment	Dec 18, 2024
Clio: A system for privacy-preserving insights into real-world AI use	Societal Impacts	Dec 12, 2024
A statistical approach to model evaluations	Evaluations	Nov 19, 2024
Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet	Product	Oct 30, 2024
Evaluating feature steering: A case study in mitigating social biases	Societal Impacts · Interpretability	Oct 25, 2024
Developing a computer use model	Announcements · Product	Oct 22, 2024
Sabotage evaluations for frontier models	Alignment	Oct 18, 2024
Using dictionary learning features as classifiers	Interpretability	Oct 16, 2024
Circuits Updates - September 2024	Interpretability	Oct 1, 2024