How is THAT History Quiz rated?

Every answer moves your Glicko-2 rating — the same algorithm chess.com uses. Player-vs-question matches in solo play, player-vs-player in 1v1 Challenge rooms.

How many clues do you get per question?

Three clues per question, hardest first. Each clue reveals one word at a time at audio narration pace. After each clue you have 5 seconds to buzz in or the next clue starts.

Who reads the clues out loud?

Brent Payne, CEO of Loud Interactive and the host. Audio is generated by an ElevenLabs Pro voice clone of his actual voice and cached per clue.

How are typed answers graded?

Four layers, fastest first: exact match against the canonical answer + aliases, fuzzy Levenshtein for typos, contained-answer detection for wrapped responses like "I think it was Paul Revere", then a Claude AI judge for the genuinely ambiguous cases.

Does Study Mode affect my rating?

No. Study Mode is for review — every question you answer there is marked "no rating impact" and skips both the public attempt history and the Glicko-2 update.

What is the Daily Challenge?

A fresh set of 10 questions everyone gets on the same day. Compete on a daily leaderboard against every other player. The bank refreshes at midnight Central time.

How THAT History Quiz Works

Three clues per question, hardest first

Every question has three clues. Clue 1 is the hardest — terse, oblique, the kind of thing only someone who actually knows the era will catch. Clue 2 fills in context. Clue 3 spells it out so most players can buzz with confidence.

You can buzz on whichever clue you want. Buzzing earlier wins more rating — see the math below.

The buzz clock

Game-show rules. Once you tap Buzz In:

5 seconds to start typing your answer.
10 seconds total once your first keystroke lands.
2 seconds max idle between keystrokes.

Any of those expiring auto-submits whatever’s in the box — empty submits are scored as wrong. A thin orange bar under the input ticks the active deadline down so you can see how much time you have left.

In voice mode, the clue is read out loud (in Brent’s voice, via ElevenLabs) and the read time is computed from word count at a comfortable 180 wpm. You get a 5-second think window after the read ends; tap the mic to buzz and speak your answer.

The rating system is real math

Your rating is computed with Glicko-2, a peer-reviewed competitive-skill rating system invented by Mark Glickman — Senior Lecturer in Statistics at Harvard and longtime chair of the US Chess Ratings Committee. Glicko-2 is the system Chess.com, Lichess, Counter-Strike, Splatoon, and dozens of other competitive platforms use to rate millions of players. It’s an evolution of the classic Elo rating system that fixes a key weakness: Elo treats your rating as a single point; Glicko adds an uncertainty estimate (rating deviation) and a volatility measure, so wins against opponents the system is sure about move you less than wins against opponents it isn’t.

What that means in practice: your number on this site means the same thing as your number on chess.com. A 1500 here is a 1500 in the well-understood Glicko-2 sense. The math isn’t something we made up.

How buzzing earlier wins more rating

Same Glicko-2 update, but the clue you buzz on multiplies the question’s effective difficulty for that attempt — buzzing on clue 1 treats the question as 15% harder (and rewards you more), buzzing on clue 3 treats it as 15% easier:

Buzz on	Difficulty multiplier	Correct ≈	Wrong ≈
Clue 1 (hardest)	1.15×	+15.8	−4.5
Clue 2	1.00×	+10.1	−10.1
Clue 3 (easiest)	0.85×	+4.5	−15.8

Approximate deltas at a stable rating against an equal-rated question. The actual math is a Glicko-2 update with a 2.5% cap per attempt(floor 8 pts, ceiling 50 pts) so a single guess can’t blow up your rating either way. Correct on clue 1 ≈ 3.5× the reward of correct on clue 3 — and the punishment for missing late is mirrored: wrong on clue 3 hurts ~3.5× more than wrong on clue 1, because you saw all the evidence.

How we grade your answer

Your typed or spoken answer goes through a four-layer pipeline. The first match wins:

Exact match.Both your answer and the canonical (plus every alias on file) are normalized — lowercased, punctuation stripped, articles “the / a / an” dropped anywhere in the string, common abbreviations expanded (WWII → world war 2, JFK → john f kennedy, etc.). If your normalized string matches any of theirs, you’re correct.
Fuzzy match. Whole-string Levenshtein similarity≥ 85% counts as a match. Catches typos like “Lincon” → Lincoln.
Per-token similarity.The best individual-word match across all candidates. Catches nickname-style answers (“Abe” → Abraham Lincoln).
LLM judge.If the answer is plausible but doesn’t clearly match, we send it to Claude Haiku 4.5 with a strict rubric: accept misspellings, phonetic transcription errors, common nicknames, partial names that uniquely identify the entity, alternate-language forms; reject paraphrases, related-but-different entities, or anything too vague. When in doubt, the LLM rejects.

If the LLM rejects an answer you think should have counted, you can hit Challengeon the result screen. The challenge is logged and emailed to the operator for review — no points are restored automatically, but a human looks at every one.

The question bank

1,960 approved questionsat last count, ranging from 5th-grade easy (rating 400 — “The president of the United States”) to graduate-seminar hard (rating 2,850 — “The Gallican Articles”). Every question has been individually difficulty-calibrated by Haiku based on (a) how obscure the answer is and (b) how oblique the hardest clue is.

The occasional gut-punch— a fresh question generated on the spot in whichever topic you’ve been weakest on, or several rating bands above you if no clear weakness has surfaced yet. Either way: plenty of hard ones in the queue.

Matchmaking

The picker stays within a ±800-rating bandaround your current rating, dedupes against the last 30 questions you’ve answered, and varies topic between consecutive picks so you don’t get three Caesars in a row.

Prepare for Competitionmode narrows the bank to the 14 topic categories used in the Great History Challenge nationals — that’s the mode David uses to train. Broad History Challenge opens it up to everything. Topic Specificlets you pick a single category and grind it.

Voice mode

Toggle voice mode on and the quiz becomes hands-mostly-free: clues are read aloud, you tap a big mic button to buzz, and you speak your answer. The voice you hear is a ElevenLabs Pro Voice Clone of Brent Payne (Loud Interactive’s CEO and the quiz’s creator). Hot clues are pre-generated and served from a CDN cache; cold clues take a few seconds the first time, then cache forever.

Voice mode works in every modern browser. If your browser ever fails to play the hosted audio, the quiz falls back to your device’s system TTS so you’re never stuck in silence.

Study mode

After you’ve played, /study lets you walk back through everything you’ve seen — all three clues, the canonical answer, the aliases that would have counted, and your full attempt history for each question. Studying doesn’t change your rating — it’s pure review.

Challenge rooms (1v1 and N-player)

Head to /challenge to create a room, get a 5-character code + QR, and share with whoever you want to play against. Once they’ve joined, the host starts the match. Everyone sees the same clue at the same instant via Supabase Realtime, and the first to tap BUZZ IN wins the buzz — the lock is server-side, so even on wobbly WiFi the database decides who was first.

Room rules:

The buzzer has 15 seconds to type their answer. No pass — once you buzz, you commit.
On wrong, the clue advances and the buzz reopens — but the wrong buzzer is locked out of that question.
If the clue got cut off mid-read, it’s re-read for the rest of the players before the next clue lands. (If it had time to read out fully, the room jumps straight to the next clue.)
Scoring per question: clue 1 = 3 pts, clue 2 = 2 pts, clue 3 = 1 pt.

Rated vs unrated:2-player rooms apply a real Glicko-2 player-vs-player rating update at the end of the match — both players’ ratings move based on the final score share, capped at 2.5% of rating per match. 3+ player rooms are unrated (in-person trivia night style).

Ready?

Play now