EuroJackpot PyLab

Coding the lottery. Keeping it human.

Lab Report — Percentile Ranking Across Columns (Bucket Test Results)

2025-12-30
Dataset: Download

This report documents a simple question with a very “PyLab” twist:

If we rank EuroJackpot numbers (1–50) using their spacing/delay behavior across st1..st5, then split that ranked list into buckets, do the next draw’s numbers land in some buckets more than we’d expect by pure randomness?

No promises. Just a method, a backtest, and clean interpretation.


1) What calculate_percentiles_multi_columns() does

The function takes a DataFrame and a list of columns that share the same value domain.
Classic examples:

  • Main numbers: ['st1','st2','st3','st4','st5'] (values 1..50)
  • Differences: ['D1','D2','D3','D4']
  • Due/interval features: ['k1','k2','k3','k4','k5'] or ['K1','K2','K3','K4','K5']

The key idea

Instead of treating st1, st2, … as separate columns, we treat them as one merged value series:

  • “Did value v appear in any of these columns on this draw?”
  • If yes, record that draw position as a hit for v.

Then for each value v, we compute its gap sequence:

  • gap from start → first hit
  • gaps between consecutive hits
  • gap from last hit → end of history (the current delay)

From the gap sequence we calculate:

  • frequency proxy (co): how many intervals exist
  • current delay (delay): the last gap
  • percentiles of gaps (median, P75, P90, P95, P99)
  • Pct_score: how “late” the current delay is, compared to that value’s own past gaps

Finally we blend frequency and “lateness” into one sortable score:

  • Norm: normalized frequency share
  • Prod = Norm * Pct_score

Higher Prod → higher rank.


2) Turning a ranked list into bucket patterns

After ranking, we have an ordered list of 50 numbers (top-ranked first).
We tested three ways to slice it:

A) 3 buckets (thirds)

For N=50 we used:

  • G1: 16 numbers
  • G2: 17 numbers
  • G3: 17 numbers

B) 2 buckets (halves)

  • H1: 25 numbers
  • H2: 25 numbers

C) Middle 68% bucket

  • Keep the middle 34 numbers
  • Drop 8 from the top + 8 from the bottom (extremes of the ranking)

For each next draw, we count how many of the 5 winning numbers fall inside each bucket.


3) Backtest design (walk-forward, no peeking)

For each time step t:

  1. Build the ranking from draws 0..t
  2. Create the buckets from that ranking
  3. Look at draw t+1 and count bucket hits
  4. Move to t+1 and repeat

We started at draw index 50 so the ranking has enough history.

Total steps in this run: 864

4) How to read the summary columns

The summary you see below is the “bucket performance” condensed over all 864 steps.

Here is what each column means:

  • label
    Which bucket we’re scoring (example: 2bucket_H2, mid68, 3bucket_G1).

  • n_steps
    How many prediction steps were tested (here: 864 next-draw evaluations).

  • avg_hits
    Average count of next-draw numbers (out of 5) that landed in that bucket.

  • expected_hits
    What we would expect from a random draw if the bucket has size s out of 50:
    expected_hits = 5 * (s / 50)
    Example: a 25-number half → expected = 5*(25/50)=2.5.

  • ratio_vs_expected
    avg_hits / expected_hits
    Above 1.00 means “a bit more than random baseline”.
    Below 1.00 means “a bit less than baseline”.

  • pct_ge3
    Percentage of steps where the bucket captured 3 or more of the next draw’s 5 numbers.

  • pct_ge4
    Percentage of steps where the bucket captured 4 or more numbers.

  • pct_eq5
    Percentage of steps where the bucket captured all 5 numbers.

  • pct_eq0
    Percentage of steps where the bucket captured zero numbers.


5) Your results (summary table)

label         n_steps  avg_hits  expected_hits  ratio_vs_expected   pct_ge3   pct_ge4   pct_eq5   pct_eq0
2bucket_H2      864    2.576389      2.5            1.030556      52.314815 19.212963  3.356481  1.736111
3bucket_G2      864    1.731481      1.7            1.018519      21.064815  4.745370  0.115741  9.837963
3bucket_G3      864    1.710648      1.7            1.006264      20.717593  4.282407  0.231481 10.532407
mid68           864    3.414352      3.4            1.004221      83.333333 47.106481 14.583333  0.000000
edge16          864    1.585648      1.6            0.991030      16.666667  3.587963  0.000000 14.583333
3bucket_G1      864    1.557870      1.6            0.973669      15.277778  3.472222  0.231481 13.888889
2bucket_H1      864    2.423611      2.5            0.969444      47.685185 15.509259  1.736111  3.356481

6) Interpretation (what this is really telling us)

6.1 The “big picture”

Most ratios are very close to 1.00 (baseline).
That’s a polite way of saying: the bucket trick is not creating a huge edge by itself.

That’s normal. The lottery is harsh like that.

Still, there are a few signals worth noticing.


6.2 Two-bucket split: H2 slightly wins over H1

  • 2bucket_H2 ratio: 1.0306
  • 2bucket_H1 ratio: 0.9694

This means that the half labeled H2 captured slightly more winning numbers than baseline, while H1 captured slightly fewer.

In plain terms:

  • if you must choose one half to “favor”, this run suggests H2 is the better half.

Still, the gap is small: about 0.0768 extra hits per draw on average (2.576 − 2.500).
Over 864 steps, that can show up as a stable nudge, or it can be a long, mild wave of randomness.

Practical takeaway:
Use the half-split as a light bias, not as a standalone rule.


6.3 Three-bucket split: the middle-ish buckets look slightly better

  • G2 ratio: 1.0185
  • G3 ratio: 1.0063
  • G1 ratio: 0.9737

Same story: small differences, yet the direction is consistent with the two-bucket finding: one side (or middle bands) tends to catch a little more than the other.

Practical takeaway:
If you like thirds, focus on G2 first, then G3.


6.4 Middle 68%: great hit counts, but that’s baked in

mid68 has:

  • avg_hits 3.41 out of 5
  • pct_ge3 83.33%
  • pct_ge4 47.11%
  • pct_eq5 14.58%

That looks impressive until you remember: the bucket has 34 numbers out of 50.

Baseline for mid68 is already:

  • expected_hits = 3.4

So the ratio is 1.004 → basically baseline.

So what is mid68 good for?
It behaves like a stability filter:

  • it rarely misses everything (pct_eq0 = 0% in your run)
  • it’s not selective enough to cut the search space hard (34/50 is still huge)

Practical takeaway:
Use mid68 as a soft guardrail (“avoid extreme-ranked edges”), not as a primary reducer.


6.5 Edge 16: slightly below baseline

edge16 ratio: 0.991

This says: the extreme-ranked numbers (top 8 + bottom 8) were hit a tiny bit less than baseline.

Again: tiny effect.

Practical takeaway:
If you want a simple rule that feels sane:

  • avoid leaning too hard into the extreme ends of the ranked list,
  • unless another feature strongly supports those values.

7) The “combinatorial” view (this is the part that sounds crazy)

You can point out something really important: pct_eq5 is not just a cute metric. It has a direct reduced-combination meaning.

If our bucket contains s numbers out of 50, then the number of 5-number combinations inside that bucket is:

C(s,5) combinations

and the total universe of possible 5-number combos is:

C(50,5) = 2,118,760

So the “space coverage” fraction is:

C(s,5) / C(50,5)

7.1 Mid68 (34 numbers)

Bucket size: 34

Combinations inside: 278,256

Total combinations: 2,118,760

Coverage fraction: 278256 / 2118760 = 13.133%

In our backtest:

pct_eq5(mid68) = 14.583% That’s 126 hits out of 864 steps.

So yes, it’s fair to say:

A set that covers 13.13% of the combination universe ended up containing the true 5-number draw 14.58% of the time.

That’s a relative lift of about:

14.583 / 13.133 ≈ 1.11× (around +11%)

Sounds juicy… and it is worth mentioning.

The catch: with 864 trials, that lift is not yet “slam dunk” statistically. A quick binomial sanity check puts it around ~1.3 standard deviations above baseline. In human terms: interesting, not proven.

7.2 Two-bucket H2 (25 numbers)

Bucket size: 25

Combinations inside: 53,130

Total combinations: 2,118,760

Coverage fraction: 53130 / 2118760 = 2.508%

In your backtest:

pct_eq5(H2) = 3.356% That’s 29 hits out of 864.

Relative lift:

3.356 / 2.508 ≈ 1.34× (around +34%)

This is the more exciting story, because it’s a much tighter slice of the universe (2.5% of combos) that still captured 3.36% of true draws.

Same catch as above: the sample count is small (29 events), so the uncertainty is big. A quick sanity check puts it around ~1.6 standard deviations above baseline. That’s not “case closed”, but it’s not nothing either.

8) What I would do with these results (in the PyLab style)

If you want one “best” option from this test:

1) Prefer the half bucket that wins (H2) as a mild bias layer
2) Treat mid68 as a gentle “don’t go crazy” filter
3) Combine the bucket rule with your real reducers: - gap filters / valid-gap zones - feature bands (sum/range/overlaps) - grid-pattern flags - covering sets + random mappings

A bucket rule alone won’t cut the space enough to matter.
A bucket rule stacked with 3–6 other independent-ish constraints can become useful.


9) Next step (to confirm it’s not just noise)

Two quick checks that usually reveal the truth fast:

  • Rolling windows: run the same summary on last 200, 400, 600 steps
    Does H2 stay on top, or does it flip back and forth?

  • Swap ordering: define buckets from the bottom-up ranking too
    If the “better half” always ends up being “the second half”, you may be seeing a stable bias in how the ranking orders values.

Paste a rolling-window summary next time and we’ll see if this nudge stays alive.


Responsible play note

This is structure, not a guarantee.
Keep it small, keep it fun, and don’t let a good run mess with your limits.