Synthetic · 2,000 accounts · 15 products · Parts 1–5
Pipeline output dashboard
Synthetic billing data · CRM market · 2,000 accounts · 15 products · L24M
Accounts
2,000
synthetic billing records
Products
15
CRM data products
Rules
1,108
confidence ≥ 0.10
Archetypes
12
MinHash + KMeans
GBB trees
12
all archetypes covered
Total revenue
$404M
L24M synthetic billing
Avg bundle size
4.1
products per account
Top lift
8.65×
Collections cluster
Pipeline architecture
1
Billing data restructuring
Wide CSV → long-form product-revenue rows.
part1_billing_wide.csvpart1_billingdata_long.csv
2
Bundle listing
All products per account concatenated into a sorted unique bundle string.
part2_bundle_listing.csv
3
Apriori association mining
Frequent itemsets → rules → heroicness & add-on strength scoring.
part3_rules.csvpart3_heroicness.xlsx
4
Bundle archetype clustering
MinHash Jaccard signatures → KMeans clusters → hotness scoring.
part4_product_bundle_cluster.csvpart4_archetype_rule_output.csv
5
Good / Better / Best trees
Hero ID → add-on pool ranking → tier assignment → sales story output.
part5_gbb_recommendations.xlsx
Output files 14 total
FileSizeRows
part1_billing_wide.csv306 KB2,000
part1_billingdata_long.csv907 KB8,322
part2_bundle_listing.csv253 KB2,000
part3_transactional.csv155 KB2,000
part3_rules.csv282 KB1,108
part3_heroicness.xlsx6 KB15
part4_product_bundle_cluster.csv522 KB2,000
part4_archetype_rule_output.csv303 KB630
part5_gbb_recommendations.csv18 KB36
part5_gbb_recommendations.xlsx14 KB4 sheets
Part 1–2 · Billing data
Wide billing CSV restructured to long-form rows. Account-level product bundles concatenated.
billingdata_long.csv — sample rows
Account IDSectorProductRevenue
…000000BankingBureau Data Feed$27,843
…000000BankingCredit Score Report$44,217
…000000BankingDecision Engine$91,200
…000001FintechFraud Detection$59,831
…000001FintechReal-Time API$73,005
…000001FintechIdentity Verification$31,400
…000002InsuranceKYC Compliance Suite$68,920
…000002InsuranceAML Screening$51,700
Product penetration
bundle_listing.csv — account bundles
Account IDSectorProduct Bundle (Account level)CountTotal Rev
…000000BankingBureau Data Feed, Credit Score Report, Decision Engine3$163,260
…000001FintechFraud Detection, Identity Verification, Real-Time API3$164,236
…000002InsuranceAML Screening, KYC Compliance Suite2$120,620
…000003Retail CreditCollections Analytics, Portfolio Monitor2$75,900
…000010Auto FinanceAffordability Check, Bureau Data Feed, Decision Engine, SME Credit Model4$212,300
Part 3 · Apriori rules
1,108 rules extracted · min support 0.05 · min confidence 0.10
Top 10 rules by lift 1,108 total
Antecedent→ ConsequentConfLiftZhangCount
Collections Analytics, Portfolio MonitorBatch Processing0.9528.65×0.975177
Bureau Data Feed, Collections Analytics, Portfolio MonitorBatch Processing0.9498.63×0.975169
Batch Processing, Collections AnalyticsPortfolio Monitor0.9478.34×0.973177
Bureau Data Feed, Portfolio MonitorBatch Processing0.9098.26×0.966189
Batch Processing, Portfolio MonitorCollections Analytics0.8998.24×0.965177
Batch Processing, Bureau Data FeedPortfolio Monitor0.9138.04×0.962189
Bureau Data Feed, Collections AnalyticsBatch Processing0.8787.98×0.960179
Batch Processing, Bureau Data FeedCollections Analytics0.8657.93×0.959179
Fraud DetectionBehavioural Analytics0.8725.28×0.961454
Real-Time APIFraud Detection0.9343.59×0.972487
Heroicness leaderboard — part3_heroicness.xlsx
Product
Heroicness
Add-on str
Net eff
Avg lift
Average revenue per account by product
Revenue by product.
Part 4 · Bundle archetypes
MinHash Jaccard → MiniBatchKMeans → 12 clusters → rule matching → hotness scores
Cluster size distribution
12 archetypes by account count.
Clustering parameters
AlgorithmMinHash + MiniBatchKMeans
MinHash permutations128
KMeans clusters12
Min item support30 accounts
Total rule matches630 rows
Hotness weights5 × 0.20 equal
part4_archetype_rule_output.csv — sample rows
ClusterAntecedentConsequentConfLiftZhangHotness
0Bureau Data FeedCollections Analytics0.8567.85×0.9740.567
0Bureau Data Feed, Credit Score ReportDecision Engine0.6731.75×0.8410.640
1Bureau Data Feed, Identity VerificationKYC Compliance Suite0.9213.18×0.9670.723
1KYC Compliance SuiteAML Screening0.8893.07×0.9580.701
2Real-Time APIFraud Detection0.9343.59×0.9720.748
2Fraud DetectionBehavioural Analytics0.8725.28×0.9610.712
3Affordability CheckDecision Engine0.9442.19×0.9760.731
4Portfolio MonitorBatch Processing0.9098.26×0.9660.759
Part 5 · GBB bundle trees
12 recommendation trees · Hero anchor + staircase tiers · anti-overlap + quality scoring
GBB trees
12
one per archetype
Unique heroes
4
Bureau Data Feed leads
Largest archetype
550
Credit Core cluster
Select an archetype
Select an archetype to view its GBB tree
GBB summary — all archetypes part5_gbb_recommendations.xlsx
ArchetypeAccountsHeroGoodBetterBestQuality