EuroJackpot PyLab

Coding the lottery. Keeping it human.

Random Mapping of a min Covering

2025-12-26

Random Number Mapping: a quick stress-test for “too-good” backtests

Responsible play note: this is hobby stats and code. Lotteries are still luck-first.
If it stops being fun, it’s time to step away.

What problem are we poking?

You’ve got a covering set (a fixed pack of lines) and you backtest it on the historical draws.

Sometimes the backtest looks… spicy. Like: “Wait, why did this set hit 5-of-5 on that many draws?”

Before we start daydreaming, we should ask a rude but healthy question:

Is this backtest strong because the cover is genuinely good, or because it accidentally fits the quirks of the past?

This post shows a simple stress-test: randomly relabel the numbers (many times) and see how the same cover behaves.

The idea in one sentence

A permutation of 1..50 creates a “new” cover that is structurally the same (still a cover), just with the labels shuffled.

If we try a lot of these shuffles, we get a feel for: - what “normal” looks like, - how rare a given backtest score is, - and how easy it is to cherry-pick a great-looking result when you run many trials.

What the script does

Inputs: - hist_df.csv with st1..st5 (past draws) - covering_50_5_4_33572.csv (your cover lines) - optional: tot_df_dynamic_basic.parquet (to export the best found cover with extra features)

For each simulation: 1. Draw a random permutation perm of 0..49 (numbers 1..50). 2. Remap the cover lines through perm. 3. Compare the remapped cover against the real history and count: - did we get at least one 5-hit line in each draw? - did we get at least one 4-hit line in each draw? - how many 4-hit / 5-hit lines on average?

We also run the original cover first (sim_id = -1), as the reference.

Baselines (random tickets): - 5-hit chance per draw (with M lines): p5_norm = M / C(50,5) - 4-hit chance per draw uses the exact hypergeometric form: 1 - C(N-K4, M)/C(N, M) where K4 = 225 for 5/50.

Outputs you get

  • covering_random_mapping_100_sims.csv
    One row per sim with p(>=1 five-hit), p(>=1 four-hit), and lift values.

  • best_number_mapping_by_5hit_lift.csv
    The permutation that produced the best 5-hit lift in this run.

  • best_cover_by_5hit_lift.csv
    The remapped cover lines for that best permutation (plus tot_df features if found).

Results (auto-filled by the script)

Setup

  • Sims: 1000 (seed=123)
  • History draws: 912
  • Cover size M: 33572
  • Total combinations C(50,5): 2,118,760
  • Random baseline p(>=1 five-hit): 1.584512%
  • Random baseline p(>=1 four-hit): 97.250882%

Original cover (real labels)

  • p(>=1 five-hit): 1.316%
    lift vs random: 0.830
  • p(>=1 four-hit): 98.684%
    lift vs random: 1.015
  • mean #4-hit lines per draw: 3.624
    lift vs random mean: 1.016

Best remapped cover (best sim in this run)

  • best sim_id: 20
  • p(>=1 five-hit): 3.180%
    lift vs random: 2.007

Where does the original sit among random remaps?

  • Percentile of original 5-hit lift among the 1000 remaps: 30.3 percentile

(If that percentile is high, it means the original labels look unusually good compared to typical shuffles. If it’s mid-pack, the “magic” was probably just normal variance.)

Lift distribution (remaps only)

  • lift5 50% (median): 0.969
  • lift5 90%: 1.315
  • lift5 95%: 1.453
  • lift5 99%: 1.730

A reality check (the part nobody wants to read)

If you run lots of permutations and keep the best one, you’re doing a search.
A search always finds something that looks “special”.

That’s not bad — it’s just how randomness behaves when you keep asking it questions.

So treat “best mapping found” as: - a fun diagnostic, - a warning about cherry-picking, - and a reason to do honest backtests (time-splits, forward tests, fresh draws).

➡️ Download the script: mincovering_random_mapping_sim.py

If you want, next steps are: - we can add a time-split version (train on older draws, score on newer draws) - and we can track whether the “best mapping” keeps its shine out-of-sample. So Ask for it.