| Project Fetch: Can Claude train a robot dog? | Policy | Nov 12, 2025 |
| Commitments on model deprecation and preservation | Alignment | Nov 4, 2025 |
| Signs of introspection in large language models | Interpretability | Oct 28, 2025 |
| Preparing for AI's economic impact: exploring policy responses | Policy | Oct 14, 2025 |
| A small number of samples can poison LLMs of any size | Alignment | Oct 9, 2025 |
| Petri: An open-source auditing tool to accelerate AI safety research | Alignment | Oct 6, 2025 |
| Building AI for cyber defenders | Policy | Oct 3, 2025 |
| Anthropic Economic Index report: Uneven geographic and enterprise AI adoption | Economic Research | Sep 15, 2025 |
| Anthropic Economic Index: Tracking AI's role in the US and global economy | Economic Research | Sep 15, 2025 |
| Anthropic Education Report: How educators use Claude | Societal Impacts | Aug 26, 2025 |
| Claude Opus 4 and 4.1 can now end a rare subset of conversations | Alignment | Aug 15, 2025 |
| Persona vectors: Monitoring and controlling character traits in language models | Interpretability | Aug 1, 2025 |
| Project Vend: Can Claude run a small shop? (And why does that matter?) | Policy | Jun 27, 2025 |
| Agentic Misalignment: How LLMs could be insider threats | Alignment | Jun 20, 2025 |
| Confidential Inference via Trusted Virtual Machines | Announcements | Jun 18, 2025 |
| SHADE-Arena: Evaluating sabotage and monitoring in LLM agents | Alignment | Jun 16, 2025 |
| Open-sourcing circuit tracing tools | Interpretability | May 29, 2025 |
| Anthropic Economic Index: AI's impact on software development | Societal Impacts | Apr 28, 2025 |
| Exploring model welfare | Alignment | Apr 24, 2025 |
| Values in the wild: Discovering and analyzing values in real-world language model interactions | Societal Impacts | Apr 21, 2025 |
| Anthropic Education Report: How university students use Claude | Announcements · Societal Impacts | Apr 8, 2025 |
| Reasoning models don't always say what they think | Alignment | Apr 3, 2025 |
| Anthropic Economic Index: Insights from Claude 3.7 Sonnet | Announcements · Societal Impacts | Mar 27, 2025 |
| Tracing the thoughts of a large language model | Interpretability | Mar 27, 2025 |
| Auditing language models for hidden objectives | Alignment · Interpretability | Mar 13, 2025 |
| Forecasting rare language model behaviors | Alignment | Feb 25, 2025 |
| Claude's extended thinking | Announcements | Feb 24, 2025 |
| Insights on Crosscoder Model Diffing | Interpretability | Feb 20, 2025 |
| The Anthropic Economic Index | Announcements · Societal Impacts | Feb 10, 2025 |
| Constitutional Classifiers: Defending against universal jailbreaks | Alignment | Feb 3, 2025 |
| Building effective agents | Product | Dec 19, 2024 |
| Alignment faking in large language models | Alignment | Dec 18, 2024 |
| Clio: A system for privacy-preserving insights into real-world AI use | Societal Impacts | Dec 12, 2024 |
| A statistical approach to model evaluations | Evaluations | Nov 19, 2024 |
| Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet | Product | Oct 30, 2024 |
| Evaluating feature steering: A case study in mitigating social biases | Societal Impacts · Interpretability | Oct 25, 2024 |
| Developing a computer use model | Announcements · Product | Oct 22, 2024 |
| Sabotage evaluations for frontier models | Alignment | Oct 18, 2024 |
| Using dictionary learning features as classifiers | Interpretability | Oct 16, 2024 |
| Circuits Updates - September 2024 | Interpretability | Oct 1, 2024 |