Research

Project Fetch: Can Claude train a robot dog?PolicyNov 12, 2025
Commitments on model deprecation and preservationAlignmentNov 4, 2025
Signs of introspection in large language modelsInterpretabilityOct 28, 2025
Preparing for AI's economic impact: exploring policy responsesPolicyOct 14, 2025
A small number of samples can poison LLMs of any sizeAlignmentOct 9, 2025
Petri: An open-source auditing tool to accelerate AI safety researchAlignmentOct 6, 2025
Building AI for cyber defendersPolicyOct 3, 2025
Anthropic Economic Index report: Uneven geographic and enterprise AI adoptionEconomic ResearchSep 15, 2025
Anthropic Economic Index: Tracking AI's role in the US and global economyEconomic ResearchSep 15, 2025
Anthropic Education Report: How educators use ClaudeSocietal ImpactsAug 26, 2025
Claude Opus 4 and 4.1 can now end a rare subset of conversationsAlignmentAug 15, 2025
Persona vectors: Monitoring and controlling character traits in language modelsInterpretabilityAug 1, 2025
Project Vend: Can Claude run a small shop? (And why does that matter?)PolicyJun 27, 2025
Agentic Misalignment: How LLMs could be insider threatsAlignmentJun 20, 2025
Confidential Inference via Trusted Virtual MachinesAnnouncementsJun 18, 2025
SHADE-Arena: Evaluating sabotage and monitoring in LLM agentsAlignmentJun 16, 2025
Open-sourcing circuit tracing toolsInterpretabilityMay 29, 2025
Anthropic Economic Index: AI's impact on software developmentSocietal ImpactsApr 28, 2025
Exploring model welfareAlignmentApr 24, 2025
Values in the wild: Discovering and analyzing values in real-world language model interactionsSocietal ImpactsApr 21, 2025
Anthropic Education Report: How university students use ClaudeAnnouncements · Societal ImpactsApr 8, 2025
Reasoning models don't always say what they thinkAlignmentApr 3, 2025
Anthropic Economic Index: Insights from Claude 3.7 SonnetAnnouncements · Societal ImpactsMar 27, 2025
Tracing the thoughts of a large language modelInterpretabilityMar 27, 2025
Auditing language models for hidden objectivesAlignment · InterpretabilityMar 13, 2025
Forecasting rare language model behaviorsAlignmentFeb 25, 2025
Claude's extended thinkingAnnouncementsFeb 24, 2025
Insights on Crosscoder Model DiffingInterpretabilityFeb 20, 2025
The Anthropic Economic IndexAnnouncements · Societal ImpactsFeb 10, 2025
Constitutional Classifiers: Defending against universal jailbreaksAlignmentFeb 3, 2025
Building effective agentsProductDec 19, 2024
Alignment faking in large language modelsAlignmentDec 18, 2024
Clio: A system for privacy-preserving insights into real-world AI useSocietal ImpactsDec 12, 2024
A statistical approach to model evaluationsEvaluationsNov 19, 2024
Raising the bar on SWE-bench Verified with Claude 3.5 SonnetProductOct 30, 2024
Evaluating feature steering: A case study in mitigating social biasesSocietal Impacts · InterpretabilityOct 25, 2024
Developing a computer use modelAnnouncements · ProductOct 22, 2024
Sabotage evaluations for frontier modelsAlignmentOct 18, 2024
Using dictionary learning features as classifiersInterpretabilityOct 16, 2024
Circuits Updates - September 2024InterpretabilityOct 1, 2024