How it works

A plain-English tour of what this site does, where the data comes from, and what it can and can't tell you about a player.

What this site is

Decklist Data is a search engine over Magic Online's public results. You type in a handle, and we surface every public finish that handle has accumulated: League 5-0 trophies, Challenge placings, the deck they most recently registered, and a rough read on what they like to play. It exists because MTGO's own site doesn't let you search by player — you have to browse event by event.

Where the data comes from

Two sources, both public:

  • mtgo.com/decklists — the primary source. Daybreak publishes a public decklist page for every Challenge, Showcase, Premier Play Qualifier, and League 5-0 finish. We crawl the monthly index and pull each event page.
  • mtggoldfish.com — a second source for backfilling. MTGGoldfish retains historical event pages and decklists for much longer than mtgo.com does (which purges most events after roughly 30 days). When the primary source has dropped an event, we fall back to Goldfish's archive.

We don't use any private API, scraped account data, or paid feed. Everything on this site can also be assembled by hand from those two public sites; we just do the assembly for you.

How the scraping actually runs

Three GitHub Actions workflows do all the data ingestion:

  • A frequent cron fires every ~30 minutes, probes the current month's MTGO index, and ingests anything new it sees. Most of these runs are no-ops because Daybreak publishes in bursts; the few that find new events land them within ~30-45 minutes of publication.
  • A daily cron does a fuller sweep in case the frequent cron missed something during an outage.
  • Manual backfill workflows ingest historical windows on demand, either from Goldfish (cheap, HTTP-only) or by re-fetching mtgo.com pages (slower, needs a real browser to render the page).

The mtgo.com pages render their content with JavaScript — there are no decklists in the static HTML. So our scraper runs a headless Chromium browser (Playwright), waits for the page to hydrate, and reads the decklists out of the rendered DOM. Each event takes ~25 seconds end-to-end.

A unique-URL constraint in the database means the same event is never ingested twice. Re-runs are free for events we already have.

How decks are classified into archetypes

When we ingest a decklist we run it through a two-pass classifier:

  • Rules first. Each format (Premodern, Modern, Legacy, etc.) has a hand-curated set of archetype rules. Each rule is a small predicate like “has at least 12 of [Goblin Guide, Monastery Swiftspear, Eidolon of the Great Revel] in the mainboard.” The first rule that matches names the deck. Burn, Tron, Affinity, Storm, Sneak & Show, etc. all fall out of this layer.
  • Color-identity fallback. If no rule matches — common for rogue brews — we read the deck's mana base (basic lands, duals, shocks, triomes, pain/check/fast-lands) to infer color identity and label the deck by standard Magic color names: “Mono-Red,” “Azorius,” “Esper,” “5-color.” A splash of just 1-2 sources is filtered out so a deck with a single Watery Grave for sideboard pyroblast doesn't become Grixis.

The classifier is intentionally conservative — we'd rather label a deck “Azorius” than guess “Spirits” wrong. Improvements land via pull request and rerun against the whole DB in a few seconds.

What a player report shows you

For each handle we display three things:

  • Recent finishes. Every public event in our DB where they registered a deck, with the place / record we extracted from the page.
  • Decks they registered. The full mainboard + sideboard from each finish, plus the inferred archetype and color identity.
  • Aggregate signals. Things like “mostly plays Mono-Red Burn” or “5-0 trophies this month: 3” — derived from the above, not from any external rating system.

What we can't see (yet)

  • Sub-5-0 League finishes. Daybreak only publishes 5-0 League trophies. A player who went 4-1 fifteen times in a row is completely invisible on our League side — not because we're missing data, but because Daybreak doesn't publish it.
  • Win-loss records for Challenges. For now we capture place (1st, 2nd, 9th-16th) for Challenges but not the actual W-L record. The record data lives in a separate JSON blob on the MTGO page that we don't currently parse; this is a known gap on our roadmap.
  • Private events. Magic Fests, Cubes, custom-format leagues, Spelltable matches — none of these generate public MTGO pages, so we can't see them.
  • Identity across accounts. If the same person plays as both “Foo” and “Foo_Backup” we can't link them — MTGO doesn't expose ownership and we're not in the business of guessing.

The tech under the hood, briefly

Next.js + Tailwind + shadcn/ui on the front end. Prisma over Postgres for the DB. Playwright (headless Chromium) for scraping JS-rendered MTGO pages, plus plain HTTP + cheerio for parsing the static MTGGoldfish pages. All scraping runs as GitHub Actions cron jobs; the web app itself is stateless and reads from the same DB the crons write into.

If something on the site looks wrong, the most useful thing to know is generally: which player, which event, what we showed, and what you expected. The DB is the source of truth, the rest is rendering.

Affiliation

Decklist Data is independent and not affiliated with Wizards of the Coast, Daybreak Games, or MTGGoldfish. All deck names, card names, and event names are trademarks of their respective owners. We use only publicly published data.