Visualizing how LOLM's manifestation gate allocates between surface and latent representations for each token. Select an example sentence to see per token gate values.
The manifestation gate is a per dimension vector g ∈ [0,1]d that blends surface (Transformer) and latent (SSM) representations via the fusion equation: g⊙LN(Whh) + (1-g)⊙LN(Wzz) + Wmm + Wrr̄. Each dimension independently learns how much to draw from each source.
Blue tokens are surface dominant. The model relies on local syntax and token level patterns. Orange tokens are latent dominant, meaning the model draws on discourse structure, long range context, or topic information. Hover any token for details.
Function words and punctuation are consistently surface dominant. Discourse connectives (however, meanwhile), distant pronoun references, and topic shift markers go latent. The gate learns this allocation without explicit supervision.