WriteSAE: Sparse Autoencoders for Recurrent State

Jack Young

Sparse autoencoders work on transformers because attention writes vectors. They miss the cache write in Gated DeltaNet, Mamba-2, and RWKV-7, which at every token form a rank-1 matrix ktvt and add it to a dk × dv cache: a vector atom cannot install where the model writes a matrix. WriteSAE makes each decoder atom a pair of vectors (vi, wi) whose outer product viwi has the same rank-1 form as the native write, and tests it against a matched-Frobenius-norm ablation. At Qwen3.5-0.8B layer 9 head 4, a single atom beats ablation on 92.4% of n = 4,851 firings (p < 10-300); a closed-form approximation predicts the per-token logit shift at R2 = 0.98; the substitution transfers to Mamba-2-370M at 88.1% on 2,500 firings; and sustained 3-position installs lift midrank target-in-continuation from 33.3% to 100% under greedy decoding, the first behavioral edit at the matrix-recurrent write site.

WriteSAE atoms substitute for native Gated DeltaNet writes. Panels show the write k_t v_t^T, the atom v_i w_i^T, the cache-slot patch, and the forward-KL controls.
Figure 1. WriteSAE atoms substitute for native Gated DeltaNet writes. At Qwen3.5-0.8B L9 H4, atoms beat ablation on 92.4% of n = 4,851 firings. Panels show the write ktvt, the atom viwi, the cache-slot patch, and the forward-KL controls.

Results

Atom substitution vs matched-Frobenius-norm ablation. Win rate is the fraction of firings where the WriteSAE atom lowers forward KL to the base model below the ablation control.
TestModelnWin rate
Single-atom substitutionQwen3.5-0.8B (GDN, L9 H4)4,85192.4%
87-atom population testQwen3.5-0.8B (GDN, L9 H4)8789.8%
Cross-architecture substitutionMamba-2-370M2,50088.1%
Closed-form prediction of logit shiftQwen3.5-0.8Bn/aR2=0.98

Behavioral install: a sustained 3-position install at 3× lift moves midrank target-in-continuation from 33.3% (baseline) to 100% under greedy decoding. First behavioral edit at the matrix-recurrent write site.

Cross-architecture sharpness: GDN rank-1 > RWKV-7 rank-2 > Mamba-2 diagonal. Mamba-2 and GLA are the confirmed negative class. WriteSAE atoms do not install cleanly when the native write is not rank-1.

How it works

A vector decoder atom cannot reach a matrix cache write: a vector in d has the wrong shape to add to a dk × dv matrix. WriteSAE makes each decoder atom a pair (vi, wi) that installs as viwi, the same rank-1 form as the native write. The atom replaces one cache slot at a time, keeping the cache rank budget intact. The matched-Frobenius-norm ablation control tests whether the atom itself is doing the work, not whether any rank-1 perturbation would do.

Install & usage

git clone https://github.com/JackYoung27/writesae
cd writesae
pip install -e .

# extract GDN states from a base model
python -m experiments.extraction.extract_states \
    --model Qwen/Qwen3.5-0.8B --layers 9 --n_samples 50000 --output_dir states

# train a WriteSAE on one head
python -m core.train --sae_type bilinear --layer 9 --head 4 \
    --n_features 2048 --k 32 --data_dir states --output_dir ckpt

Pretrained checkpoints

Load the paper's SAE checkpoints from HuggingFace Hub. Variants cover four SAE architectures (WriteSAE / FlatSAE / MatrixSAE / BilinearSAE), three Qwen3.5 scales (0.8B / 4B / 27B), and four cross-architecture models (DeltaNet, GLA, Mamba-2, RWKV-7).

from huggingface_hub import snapshot_download

path = snapshot_download("JackYoung27/writesae-ckpts", local_dir="ckpts")
# ckpts/manifest.json maps tags to SHA256 and metadata.

BibTeX

@article{young2026writesae,
  title   = {WriteSAE: Sparse Autoencoders for Recurrent State},
  author  = {Young, Jack},
  journal = {arXiv preprint arXiv:TBA},
  year    = {2026}
}