Heavy-lift x-pattern filter (ranked search)
Why this post exists
If you’ve ever stared at a lottery draw and thought “there must be some structure hiding in here…”, you’re not alone. The annoying part is that most “systems” jump straight to predicting numbers, and that’s where things get messy fast.
This post takes a different angle:
Don’t start with “the next numbers”.
Start with filters that carve the full space into a smaller region that still looks like it contains real historical draws.
That’s the whole mood.
Instead of betting on a single fragile idea, we try to find reasonable constraints — a kind of “shape” of the next draw — then we let that shape guide the rest of the workflow (ranking numbers, reducing combinations, building packs, etc.).
The cast of characters: tot_df and hist_df
We use two datasets side by side:
-
tot_df: the big space of valid combinations (or a large engineered subset of it).
This is where “how much space did we cut?” is measured. -
hist_df: the actual historical draws, engineered with the same feature columns.
This is where “did we kill the history?” is measured.
The trick is: every time we filter one, we filter the other in lockstep with the same rule. Otherwise you end up comparing apples to spaceships.
What are “x-columns” anyway?
Your x1 … x20 columns are feature-like signals derived from the draw or from feature engineering. The details can differ depending on how you built them, but from the search perspective we treat them as columns that often take values like 0/1 (sometimes more, but for this post we focus on binary patterns).
An example “pattern filter” looks like:
- columns:
x7, x9, x12, x17, x19 - pattern:
[1, 1, 1, 0, 0]
Meaning we keep only rows where:
x7==1 AND x9==1 AND x12==1 AND x17==0 AND x19==0
We apply this rule to both datasets:
- filter the space (tot_df)
- filter the history (hist_df)
Now we can measure whether this pattern is doing something interesting.
Heavy lift (but measured the right way)
If I only tell you “this filter shrinks tot_df by 98%”, that sounds impressive… but it’s not enough.
Because a filter can shrink the space by 98% and also shrink the history by 99.9%. That’s not a “smart” filter — it’s just a meat grinder.
So we track two ratios:
df_ratio = len(tot_f) / len(tot)df_n_ratio = len(hist_f) / len(hist)
And then we compare them:
ratio_of_ratios = df_n_ratio / df_ratio
Interpretation (roughly):
-
ratio_of_ratios > 1
history survives better than the space does
→ good sign: the filter isn’t just deleting everything randomly -
ratio_of_ratios < 1
history collapses faster than the space
→ warning: it might be too harsh or too “unrealistic”
This isn’t proof of predictability. It’s a sanity check that the filter is not pure fantasy.
“Due-ness”: the delay-percent idea (the polite version)
Next we ask a second question:
After applying the filter, does the filtered subset look “late” compared to its own past rhythm?
That’s where the delay-percent diagnostics come in.
The filtered subset produces a sequence of “hit” timestamps inside the historical timeline. From that we get a distribution of gaps (intervals), and then we ask:
- how many draws since the last “hit” of this filtered subset?
- where does that gap fall compared to the subset’s own gap history? (P90, P95, P99, etc.)
The output is a percentile-like score (pct_score) that says:
- low score: “this happened recently, not really due”
- high score: “this hasn’t happened in a while relative to its own history”
Again: not magic. But it’s a useful ranking signal.
Why we need a penalty (tiny subsets are liars)
Here’s the classic trap:
You find a pattern that occurred 8 times in 10+ years.
It might look insanely “due”. It might even have a great ratio_of_ratios.
But it’s also extremely easy for tiny samples to look good by luck.
So we add a support penalty — a factor that pushes the score toward zero when the filtered history subset has too few hits (hit_count is small).
Conceptually:
- small hit_count → the pattern might be “cute” but fragile
- big hit_count → the pattern has enough evidence to be taken seriously
In other words: we don’t let a unicorn run the whole lab.
The ranking score (simple on purpose)
We combine these ideas into one score:
- lift: ratio_of_ratios
- due: pct_score / 100
- support penalty: support_factor(hit_count)
So the ranking score is:
```text score = ratio_of_ratios * (pct_score/100) * support_factor(hit_count)
It’s not the only score you could use. It’s just a practical one that behaves in the direction we want:
- reward strong “space cut”
- reward patterns that preserve history relative to the cut
- reward “late” subsets
- punish tiny samples
What to do with the winner
Once you have a top pattern (columns + pattern values), you can:
- Filter
tot_dfinto a reduced pool region - Filter
hist_dfinto the comparable historical subset - Use the filtered history to:
- rank numbers (percentiles across
st1..st5) - test additional features
- build reduced packs for play
- compare performance vs unfiltered baselines
It’s basically a “zoom lens”: you don’t claim to see the future — you claim you’re focusing on a region that seems to behave like real draws.
Download the code
- ✅ Code: search_xpattern_lift_due.py
It generates a ranked candidates table and a markdown snippet (tables + diagnostics). But have to warn that as it is the script takes a lot of time to run.
Final reality check (because we’re not selling fairy tales)
This method is about structured reduction and ranking. It tries to be honest with two hard truths:
- the lottery is designed to be random
- humans (and models) love to hallucinate patterns
So the goal isn’t certainty. The goal is a workflow that gives you:
- fewer random guesses
- more measurable decisions
- and a clean way to test whether a filter is “smart” or just “violent”
If nothing else, it’s a much better conversation with your data than picking birthdays and hoping the universe vibes with you.
Heavy-lift x-pattern filter for the next draw (ranked search)
Candidate
- Columns:
['x5', 'x6', 'x9', 'x11', 'x19'] - Pattern:
[0, 0, 0, 1, 1]
Lockstep reduction
- Space rows: 5850 / 1221759 (df_ratio=0.004788)
- Hist rows: 73 / 3225 (df_n_ratio=0.022636)
- ratio_of_ratios: 4.7274
Due summary on the filtered history
| label | hit_count | draws_since_last | median | P75 | P90 | P95 | P99 | max | pct_score |
|---|---|---|---|---|---|---|---|---|---|
| best_filter | 73 | 136 | 28 | 66 | 101 | 117 | 151 | 192 | 98.6486 |
Heavy-lift x-pattern filters (ranked search)
Best x-pattern candidates (ranked)
| k | cols | pattern | ratio_of_ratios | df_ratio | df_n_ratio | space_rows | hist_rows | pct_score | draws_since_last | hit_count |
|---|---|---|---|---|---|---|---|---|---|---|
| 5 | [x5, x6, x9, x11, x19] | [0, 0, 0, 1, 1] | 4.72741 | 0.0048 | 0.0226 | 5850 | 73 | 98.6 | 136 | 73 |
| 5 | [x5, x13, x18, x19, x20] | [1, 0, 0, 1, 0] | 4.72741 | 0.0048 | 0.0226 | 5850 | 73 | 89.2 | 96 | 73 |
| 5 | [x2, x3, x13, x14, x17] | [0, 0, 1, 1, 0] | 5.11596 | 0.0048 | 0.0245 | 5850 | 79 | 78.1 | 62 | 79 |
| 6 | [x3, x9, x12, x15, x16, x18] | [0, 0, 0, 0, 0, 0] | 5.87729 | 0.0051 | 0.0298 | 6188 | 96 | 64.9 | 36 | 96 |
| 6 | [x9, x12, x15, x16, x18, x19] | [0, 0, 0, 0, 0, 0] | 5.69362 | 0.0051 | 0.0288 | 6188 | 93 | 67.0 | 36 | 93 |
| 6 | [x3, x8, x9, x15, x16, x19] | [1, 0, 0, 0, 0, 0] | 4.17768 | 0.0049 | 0.0205 | 5985 | 66 | 100.0 | 398 | 66 |
| 5 | [x3, x5, x11, x19, x20] | [1, 0, 1, 1, 0] | 4.49963 | 0.0043 | 0.0192 | 5220 | 62 | 93.7 | 154 | 62 |
| 5 | [x2, x10, x13, x15, x19] | [0, 1, 1, 0, 0] | 4.14457 | 0.0048 | 0.0198 | 5850 | 64 | 100.0 | 254 | 64 |
| 6 | [x5, x6, x9, x10, x16, x18] | [0, 0, 0, 0, 0, 0] | 4.59163 | 0.0051 | 0.0233 | 6188 | 75 | 85.5 | 76 | 75 |
| 5 | [x1, x5, x10, x11, x17] | [0, 1, 1, 0, 0] | 4.66265 | 0.0048 | 0.0223 | 5850 | 72 | 84.9 | 89 | 72 |
| 5 | [x1, x6, x10, x16, x20] | [1, 0, 0, 0, 0] | 3.74348 | 0.0087 | 0.0326 | 10626 | 105 | 95.3 | 83 | 105 |
| 5 | [x3, x5, x10, x11, x19] | [0, 1, 1, 0, 0] | 3.88793 | 0.0060 | 0.0233 | 7308 | 75 | 98.7 | 170 | 75 |
| 6 | [x1, x2, x5, x8, x9, x18] | [0, 0, 0, 0, 0, 0] | 3.97941 | 0.0070 | 0.0279 | 8568 | 90 | 91.2 | 108 | 90 |
| 5 | [x1, x3, x9, x13, x19] | [0, 1, 0, 1, 0] | 3.98962 | 0.0054 | 0.0214 | 6552 | 69 | 97.1 | 173 | 69 |
| 5 | [x1, x2, x7, x10, x13] | [1, 0, 0, 1, 0] | 3.93179 | 0.0054 | 0.0211 | 6552 | 68 | 98.6 | 154 | 68 |
| 6 | [x1, x2, x4, x8, x17, x18] | [0, 0, 0, 0, 0, 0] | 3.80255 | 0.0070 | 0.0267 | 8568 | 86 | 95.4 | 108 | 86 |
| 6 | [x2, x5, x6, x8, x9, x18] | [0, 0, 0, 0, 0, 0] | 4.22430 | 0.0051 | 0.0214 | 6188 | 69 | 90.7 | 108 | 69 |
| 5 | [x2, x9, x12, x19, x20] | [1, 0, 0, 0, 1] | 3.62650 | 0.0072 | 0.0260 | 8775 | 84 | 100.0 | 166 | 84 |
| 5 | [x4, x5, x6, x9, x19] | [1, 0, 0, 0, 1] | 4.46837 | 0.0048 | 0.0214 | 5850 | 69 | 85.7 | 72 | 69 |
| 6 | [x3, x4, x10, x12, x14, x19] | [1, 0, 0, 0, 0, 0] | 4.17768 | 0.0049 | 0.0205 | 5985 | 66 | 92.5 | 139 | 66 |
Notes
- This ranks pattern-filters, not outcomes.
ratio_of_ratios> 1 means the filter keeps history alive better than random shrinking.pct_scoreclose to 100 means the subset looks late relative to its own gaps.support_factorpushes down tiny-hit patterns so we don’t get hypnotized by noise.- This is not a promise of outcomes. It's a ranking method for pattern-filters.
- support_factor penalizes tiny subsets so we don't get fooled by 7 or 9 historical hits.