WriteSAE: Sparse Autoencoders for Recurrent State
Sparse autoencoders work on transformers because attention writes vectors. They miss the cache write in Gated DeltaNet, Mamba-2, and RWKV-7, which at every token form a rank-1 matrix ktvt⊤ and add it to a dk × dv cache: a vector atom cannot install where the model writes a matrix. WriteSAE makes each decoder atom a pair of vectors (vi, wi) whose outer product viwi⊤ has the same rank-1 form as the native write, and tests it against a matched-Frobenius-norm ablation. At Qwen3.5-0.8B layer 9 head 4, a single atom beats ablation on 92.4% of n = 4,851 firings (p < 10-300); a closed-form approximation predicts the per-token logit shift at R2 = 0.98; the substitution transfers to Mamba-2-370M at 88.1% on 2,500 firings; and sustained 3-position installs lift midrank target-in-continuation from 33.3% to 100% under greedy decoding, the first behavioral edit at the matrix-recurrent write site.
Results
| Test | Model | n | Win rate |
|---|---|---|---|
| Single-atom substitution | Qwen3.5-0.8B (GDN, L9 H4) | 4,851 | 92.4% |
| 87-atom population test | Qwen3.5-0.8B (GDN, L9 H4) | 87 | 89.8% |
| Cross-architecture substitution | Mamba-2-370M | 2,500 | 88.1% |
| Closed-form prediction of logit shift | Qwen3.5-0.8B | n/a | R2=0.98 |
Behavioral install: a sustained 3-position install at 3× lift moves midrank target-in-continuation from 33.3% (baseline) to 100% under greedy decoding. First behavioral edit at the matrix-recurrent write site.
Cross-architecture sharpness: GDN rank-1 > RWKV-7 rank-2 > Mamba-2 diagonal. Mamba-2 and GLA are the confirmed negative class. WriteSAE atoms do not install cleanly when the native write is not rank-1.
How it works
A vector decoder atom cannot reach a matrix cache write: a vector in ℝd has the wrong shape to add to a dk × dv matrix. WriteSAE makes each decoder atom a pair (vi, wi) that installs as viwi⊤, the same rank-1 form as the native write. The atom replaces one cache slot at a time, keeping the cache rank budget intact. The matched-Frobenius-norm ablation control tests whether the atom itself is doing the work, not whether any rank-1 perturbation would do.
Install & usage
git clone https://github.com/JackYoung27/writesae
cd writesae
pip install -e .
# extract GDN states from a base model
python -m experiments.extraction.extract_states \
--model Qwen/Qwen3.5-0.8B --layers 9 --n_samples 50000 --output_dir states
# train a WriteSAE on one head
python -m core.train --sae_type bilinear --layer 9 --head 4 \
--n_features 2048 --k 32 --data_dir states --output_dir ckpt
Pretrained checkpoints
Load the paper's SAE checkpoints from HuggingFace Hub. Variants cover four SAE architectures (WriteSAE / FlatSAE / MatrixSAE / BilinearSAE), three Qwen3.5 scales (0.8B / 4B / 27B), and four cross-architecture models (DeltaNet, GLA, Mamba-2, RWKV-7).
from huggingface_hub import snapshot_download
path = snapshot_download("JackYoung27/writesae-ckpts", local_dir="ckpts")
# ckpts/manifest.json maps tags to SHA256 and metadata.
BibTeX
@article{young2026writesae,
title = {WriteSAE: Sparse Autoencoders for Recurrent State},
author = {Young, Jack},
journal = {arXiv preprint arXiv:TBA},
year = {2026}
}