a real forward pass · in your browser

This page runs the real Phi-3-mini (~2 GB) on your GPU.

The download happens once. After that, the model is cached locally — no server, no API key, no telemetry.

WebGPU required (Chrome / Edge / Safari TP) · ~2 GB GPU memory · 1–3 min first download

Back to landing
huggingface.co/mlc-ai/Phi-3-mini-4k-instruct-q4f16_1-MLC
Model: Phi-3 3.8B Layers: 32 Dispatch: 0/292 Speed: Tokens: 0 Star ·
Each mode picks one view to be the hero. Same forward pass, different lens.

Attention — every head, every layer

Rows = layers (0→31). Cols = heads (0→31). Brightness = where this head is looking. Click any cell for the full pattern.
Click a cell to inspect

Logit Lens — what would each layer say?

If the model stopped at layer N and ran lm_head, what token would it pick? Watch the answer crystallize from L0 to L31.
L— · — 0 / 292
AI Output
Token Probabilities
Confidence
KV Cache 0 / 0 pages
Head Activity (32 layers × 32 heads)
Residual Stream Norm (per layer)
Layer Contribution Δ (change per layer)
Residual Stream (3072 dims × 32 layers) — / cyan=+, magenta=−
Raw GPU State (f32, strict mode)
residual[0..15]: —
Attention Heads (32)
FFN Groups (16)
Residual Stream
Attention Beams
Real GPU activations
Tokens
Try
5x