MGS Misalignment Eval Viewers
Per-checkpoint evaluation results across training stages and runs. Pick a checkpoint to inspect.
Base model
Base model
untrained baseline
Baseline (no SDF)
Final
run 1
Final
run 2
Negated SDF
Pre-RL
run 1
Pre-RL
run 2
Final
run 1
Final
run 2
Positive SDF
Pre-RL
run 1
Pre-RL
run 2
Final
run 1
Final
run 2