Unified Model Analysis Viewer

About This Demo

This demo accompanies the paper "LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs". We train connector-only VLMs (frozen LLM + frozen vision encoder) and analyze interpretability of visual tokens at each layer using three methods:

LatentLens (ours): Nearest neighbors in contextual text embeddings from a Visual Genome corpus
LogitLens: Predicted tokens by applying the LM head to intermediate layer representations
EmbeddingLens: Nearest neighbors in the LLM input embedding matrix

Click on any model cell below to explore per-patch interpretability across layers.

LLM \ Vision Encoder	ViT-L/14-336	DINOv2-L-336	SigLIP
LLaMA3-8B	View Results LN: 9 \| LL: 15 \| NN: 15	View Results LN: 9 \| LL: 15 \| NN: 15	View Results LN: 9 \| LL: 15 \| NN: 15
OLMo-7B	View Results LN: 9 \| LL: 15 \| NN: 15	View Results LN: 9 \| LL: 15 \| NN: 15	View Results LN: 9 \| LL: 15 \| NN: 15
Qwen2-7B	View Results LN: 9 \| LL: 14 \| NN: 11	View Results LN: 9 \| LL: 14 \| NN: 11	View Results LN: 9 \| LL: 14 \| NN: 11

LLM \ Vision Encoder

ViT-L/14-336

DINOv2-L-336

SigLIP

LLaMA3-8B

View Results

LN: 9 | LL: 15 | NN: 15

View Results

LN: 9 | LL: 15 | NN: 15

View Results

LN: 9 | LL: 15 | NN: 15

OLMo-7B

View Results

LN: 9 | LL: 15 | NN: 15

View Results

LN: 9 | LL: 15 | NN: 15

View Results

LN: 9 | LL: 15 | NN: 15

Qwen2-7B

View Results

LN: 9 | LL: 14 | NN: 11

View Results

LN: 9 | LL: 14 | NN: 11

View Results

LN: 9 | LL: 14 | NN: 11

Off-the-Shelf VLMs

Pre-trained VLMs analyzed without connector-only training. These models were trained end-to-end by their respective teams.

Model	Details	Results
Qwen2-VL-7B	Qwen2 backbone, dynamic resolution ViT. 10 images.	View Results
Qwen2.5-VL-32B	Qwen2.5-32B backbone, 64 layers. 10 images.	View Results
Molmo-7B-D	Qwen2 backbone, multi-crop ViT. 10 images.	View Results
LLaVA-1.5-7B	Vicuna backbone, CLIP ViT-L/14-336. 10 images.	View Results

Ablation Studies

Exploring the impact of training variations and random seeds.

Model / Variation	Details	Results
Seed 10	10 images	View Results
Seed 11	10 images	View Results
Linear Connector	10 images	View Results
Unfreeze LLM	10 images	View Results
First-Sentence Captions	10 images	View Results
Earlier ViT Layer (6)	10 images	View Results
Earlier ViT Layer (10)	10 images	View Results
TopBottom Task	10 images	View Results
TopBottom + Unfreeze	10 images	View Results

Available (click to view)

No results available

LatentLens Interactive Demo

About This Demo

Off-the-Shelf VLMs

Ablation Studies