Alignment Forum

Alignment Forum

Alignment Forum

Alignment Forum publishes articles covering AI, LLM. A trusted source for AI and technology insights.

Profile generated by AI for Anything

All24 Videos Shorts Articles24

Towards surfacing model algorithms with meta-tokens in the J-Space

Towards surfacing model algorithms with meta-tokens in the J-Space

22h

A Red Line and Oversight Framework for Government AI Contracts

A Red Line and Oversight Framework for Government AI Contracts

2d

Endogenous Alignment

Endogenous Alignment

3d

Should we benchmark conceptual capabilities using judgment prediction tasks?

Should we benchmark conceptual capabilities using judgment prediction tasks?

3d

Announcing the Corrigibility Research Fund

Announcing the Corrigibility Research Fund

4d

Why I Left Google DeepMind

Why I Left Google DeepMind

6d

Open Distillation of Hereditary Traits

Open Distillation of Hereditary Traits

7d

The original title is "Prism: Automating Science-of-Evals Research"

The original title is "Prism: Automating Science-of-Evals Research"

8d

Independent alignment of language models

Independent alignment of language models

9d

From wantons to moral agents

From wantons to moral agents

9d

The current bottleneck is political will, not research

The current bottleneck is political will, not research

9d

Value generalisation: value correction

Value generalisation: value correction

11d

The original title is a question: "How robust are natural language autoencoders to initialization?"

The original title is a question: "How robust are natural language autoencoders to initialization?"

11d

AI 2040: Plan A

AI 2040: Plan A

12d

Announcing our $160M grant from Coefficient Giving

Announcing our $160M grant from Coefficient Giving

12d

Modular Pretraining Enables Access Control

Modular Pretraining Enables Access Control

12d

Notes on technical alignment via human-like social drives

Notes on technical alignment via human-like social drives

12d

Data filtering works a lot worse than you would expect

Data filtering works a lot worse than you would expect

14d

Pragmatic FDT, and predictors as game theory

Pragmatic FDT, and predictors as game theory

18d

What Capable Agents Must Know: Why AI Consciousness May Be an Inevitable Byproduct of Capability

What Capable Agents Must Know: Why AI Consciousness May Be an Inevitable Byproduct of Capability

21d

Deployment Awareness Matters More Than Evaluation Awareness

Deployment Awareness Matters More Than Evaluation Awareness

24d

The Case for Model Forensics

The Case for Model Forensics

25d

LLM-Driven Feature Discovery

LLM-Driven Feature Discovery

28d

The original title is "How transparent is DiffusionGemma (and why it matters)"

The original title is "How transparent is DiffusionGemma (and why it matters)"

30d