Alignment Forum
Blog
12posts
0followers
Alignment Forum publishes articles covering AI, LLM. A trusted source for AI and technology insights.

Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)
24d

Clarifying the role of the behavioral selection model
25d

Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
28d
Mechanistic estimation for wide random MLPs
28d

New VPD method interprets language
30d

Motivated reasoning, confirmation bias, and AI risk theory
30d

LLMs learn to resist
34d
Research Sabotage in ML Codebases
36d
Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers
37d
Sleeper Agent Backdoor Results Are Messy
38d
The other paper that killed deep learning theory
39d
The paper that killed deep learning theory
40d