Pipeline output dashboard
Synthetic billing data · CRM market · 2,000 accounts · 15 products · L24M
Accounts
2,000
synthetic billing records
Products
15
CRM data products
Rules
1,108
confidence ≥ 0.10
Archetypes
12
MinHash + KMeans
GBB trees
12
all archetypes covered
Total revenue
$404M
L24M synthetic billing
Avg bundle size
4.1
products per account
Top lift
8.65×
Collections cluster
Pipeline architecture
1
Billing data restructuring
Wide CSV → long-form product-revenue rows.
part1_billing_wide.csvpart1_billingdata_long.csv
2
Bundle listing
All products per account concatenated into a sorted unique bundle string.
part2_bundle_listing.csv
3
Apriori association mining
Frequent itemsets → rules → heroicness & add-on strength scoring.
part3_rules.csvpart3_heroicness.xlsx
4
Bundle archetype clustering
MinHash Jaccard signatures → KMeans clusters → hotness scoring.
part4_product_bundle_cluster.csvpart4_archetype_rule_output.csv
5
Good / Better / Best trees
Hero ID → add-on pool ranking → tier assignment → sales story output.
part5_gbb_recommendations.xlsx
Output files 14 total
| File | Size | Rows |
|---|---|---|
| part1_billing_wide.csv | 306 KB | 2,000 |
| part1_billingdata_long.csv | 907 KB | 8,322 |
| part2_bundle_listing.csv | 253 KB | 2,000 |
| part3_transactional.csv | 155 KB | 2,000 |
| part3_rules.csv | 282 KB | 1,108 |
| part3_heroicness.xlsx | 6 KB | 15 |
| part4_product_bundle_cluster.csv | 522 KB | 2,000 |
| part4_archetype_rule_output.csv | 303 KB | 630 |
| part5_gbb_recommendations.csv | 18 KB | 36 |
| part5_gbb_recommendations.xlsx | 14 KB | 4 sheets |
Part 1–2 · Billing data
Wide billing CSV restructured to long-form rows. Account-level product bundles concatenated.
billingdata_long.csv — sample rows
| Account ID | Sector | Product | Revenue |
|---|---|---|---|
| …000000 | Banking | Bureau Data Feed | $27,843 |
| …000000 | Banking | Credit Score Report | $44,217 |
| …000000 | Banking | Decision Engine | $91,200 |
| …000001 | Fintech | Fraud Detection | $59,831 |
| …000001 | Fintech | Real-Time API | $73,005 |
| …000001 | Fintech | Identity Verification | $31,400 |
| …000002 | Insurance | KYC Compliance Suite | $68,920 |
| …000002 | Insurance | AML Screening | $51,700 |
Product penetration
bundle_listing.csv — account bundles
| Account ID | Sector | Product Bundle (Account level) | Count | Total Rev |
|---|---|---|---|---|
| …000000 | Banking | Bureau Data Feed, Credit Score Report, Decision Engine | 3 | $163,260 |
| …000001 | Fintech | Fraud Detection, Identity Verification, Real-Time API | 3 | $164,236 |
| …000002 | Insurance | AML Screening, KYC Compliance Suite | 2 | $120,620 |
| …000003 | Retail Credit | Collections Analytics, Portfolio Monitor | 2 | $75,900 |
| …000010 | Auto Finance | Affordability Check, Bureau Data Feed, Decision Engine, SME Credit Model | 4 | $212,300 |
Part 3 · Apriori rules
1,108 rules extracted · min support 0.05 · min confidence 0.10
Top 10 rules by lift 1,108 total
| Antecedent | → Consequent | Conf | Lift | Zhang | Count |
|---|---|---|---|---|---|
| Collections Analytics, Portfolio Monitor | Batch Processing | 0.952 | 8.65× | 0.975 | 177 |
| Bureau Data Feed, Collections Analytics, Portfolio Monitor | Batch Processing | 0.949 | 8.63× | 0.975 | 169 |
| Batch Processing, Collections Analytics | Portfolio Monitor | 0.947 | 8.34× | 0.973 | 177 |
| Bureau Data Feed, Portfolio Monitor | Batch Processing | 0.909 | 8.26× | 0.966 | 189 |
| Batch Processing, Portfolio Monitor | Collections Analytics | 0.899 | 8.24× | 0.965 | 177 |
| Batch Processing, Bureau Data Feed | Portfolio Monitor | 0.913 | 8.04× | 0.962 | 189 |
| Bureau Data Feed, Collections Analytics | Batch Processing | 0.878 | 7.98× | 0.960 | 179 |
| Batch Processing, Bureau Data Feed | Collections Analytics | 0.865 | 7.93× | 0.959 | 179 |
| Fraud Detection | Behavioural Analytics | 0.872 | 5.28× | 0.961 | 454 |
| Real-Time API | Fraud Detection | 0.934 | 3.59× | 0.972 | 487 |
Heroicness leaderboard — part3_heroicness.xlsx
Product
Heroicness
Add-on str
Net eff
Avg lift
Average revenue per account by product
Part 4 · Bundle archetypes
MinHash Jaccard → MiniBatchKMeans → 12 clusters → rule matching → hotness scores
Cluster size distribution
Clustering parameters
| Algorithm | MinHash + MiniBatchKMeans |
| MinHash permutations | 128 |
| KMeans clusters | 12 |
| Min item support | 30 accounts |
| Total rule matches | 630 rows |
| Hotness weights | 5 × 0.20 equal |
part4_archetype_rule_output.csv — sample rows
| Cluster | Antecedent | Consequent | Conf | Lift | Zhang | Hotness |
|---|---|---|---|---|---|---|
| 0 | Bureau Data Feed | Collections Analytics | 0.856 | 7.85× | 0.974 | 0.567 |
| 0 | Bureau Data Feed, Credit Score Report | Decision Engine | 0.673 | 1.75× | 0.841 | 0.640 |
| 1 | Bureau Data Feed, Identity Verification | KYC Compliance Suite | 0.921 | 3.18× | 0.967 | 0.723 |
| 1 | KYC Compliance Suite | AML Screening | 0.889 | 3.07× | 0.958 | 0.701 |
| 2 | Real-Time API | Fraud Detection | 0.934 | 3.59× | 0.972 | 0.748 |
| 2 | Fraud Detection | Behavioural Analytics | 0.872 | 5.28× | 0.961 | 0.712 |
| 3 | Affordability Check | Decision Engine | 0.944 | 2.19× | 0.976 | 0.731 |
| 4 | Portfolio Monitor | Batch Processing | 0.909 | 8.26× | 0.966 | 0.759 |
Part 5 · GBB bundle trees
12 recommendation trees · Hero anchor + staircase tiers · anti-overlap + quality scoring
GBB trees
12
one per archetype
Unique heroes
4
Bureau Data Feed leads
Largest archetype
550
Credit Core cluster
Select an archetype
Select an archetype to view its GBB tree
GBB summary — all archetypes part5_gbb_recommendations.xlsx
| Archetype | Accounts | Hero | Good | Better | Best | Quality |
|---|