LOLM Architecture | Interactive 3D Visualization

Surface Decoder

16 Transformer layers with RoPE, 16 attention heads (d_k=64), d_ff=4096 with GELU. Handles fast token level prediction. Output h ∈ (T, 1024).

Latent SSM

4 selective Mamba layers, d_inner=2048, d_state=32. Captures slow discourse level structure. Trained via CPC loss (InfoNCE). Output z ∈ (T, 1024).

Regime Layer

32 Gumbel-Softmax codes, d_r=128, τ=0.5. Detects discourse transitions. Gradient detached before fusion. Changepoint + diversity losses.

Persistent Memory

3 banks (episodic, semantic, self), 128 slots each, d_s=1024. Chunked read/write (C=4). Gated update per slot. Output m ∈ (T, 1024).

Manifestation Gate

Per dimension g ∈ [0,1]^1024. 2-layer MLP conditioned on h, z, m, r. Learns contextual surface/latent blend per token position.

Fusion Layer

g⊙LN(W_h · h) + (1-g)⊙LN(W_z · z) + W_m · m + W_r · r̄. Per dimension gated blend plus additive memory and regime contributions.