Skip to main content
§ Product

Customer Clustering & LTV Engine

An unsupervised clustering layer over the client's own customer-and-transaction data — surfaces behavioral segments using RFM and broader signal sets, attaches a lifetime-value forecast per segment, feeds marketing and retention decisions.

Engagement
8–12 week build · quarterly model refresh
Built for
Heads of growth · CMOs · Commercial leads
§ Problem

Customer segmentation at most established businesses is either coarse (demographics) or absent. The business knows what it sold to whom, but the segment-level structure of the base — how customers cluster on behavior, what each cluster's lifetime value looks like — is unmodeled.

What this is

Customer segmentation and lifetime-value modeling built on the client's own data, not third-party panels. Three layers:

  • Feature engineering. RFM baseline plus behavioral signals (product-category affinity, channel patterns, support interactions, campaign response). Per-engagement calibrated to the data available.
  • Clustering. HDBSCAN or hierarchical method as the default; K-means where the data structure justifies it. Cluster-count selection by silhouette + interpretation quality, not by elbow alone.
  • Lifetime-value forecasting. Per-segment LTV model — BG/NBD-class for non-contractual, hazard-model-based for contractual subscription. Documented uncertainty bands.

How it's built

Python (Polars / Pandas) for the feature engineering, scikit-learn for the clustering layer, lifetimes library plus custom hazard models for the LTV layer. Output deliverable: segment definitions, per-segment LTV forecasts, integration into the client's CRM (Hubspot, Salesforce, Klaviyo, custom).

What you get

  • The segmentation model with per-segment definitions.
  • The LTV forecast per segment.
  • The CRM integration for downstream marketing use.
  • Documentation of the segmentation methodology — defensible to the CMO's reviewer.
  • Quarterly model refresh as the customer base evolves.
§ How we engage

Engagement is shape, not list.

Length and price are functions of the data and the destination. The shape below is the typical engagement.

Length
8–12 week build · quarterly model refresh

Scoped during the discovery call against the actual data and the operation it integrates with.

Lead
Bogdan

Principal engineer. Architecture and most code ships through one keyboard.

Cadence
Async, weekly

Written updates between, calls when the decision needs the room.

Bar
Production

Async correctness, capacity under burst, observability at every boundary.

§ Questions

What buyers ask about this one.

  • Why custom rather than a SaaS like Klaviyo's predictive segments?

    SaaS predictive segments are excellent for the customers that fit the SaaS's assumed data shape (e-commerce purchase-driven, mostly transactional). For businesses with richer or different data — B2B with multi-stakeholder buying processes, subscription with cohort complexity, marketplace with two-sided dynamics — a custom clustering layer captures structure SaaS misses. The deliverable is the model, not a SaaS subscription.

  • What signals does the clustering use?

    RFM as the baseline (recency, frequency, monetary). Plus behavioral signals — product-category affinity, channel preference, support-ticket pattern, response-to-prior-campaign signal, tenure trajectory. Per-engagement, the signal set is calibrated to what the client's data actually contains.

  • Doesn't K-means just produce 'high-value, medium, low' segments?

    K-means produces what you ask of it. The work is the feature engineering and the cluster-count selection — finding the segment structure that's actually decision-relevant, rather than the structure that's mathematically clean. Hierarchical methods (HDBSCAN) often work better than K-means for the real-world case where some clusters are dense and others are sparse.

  • How does the LTV forecast tie in?

    Per-segment lifetime-value model (BG/NBD-class for non-contractual relationships, hazard-model-based for contractual). The forecast attaches to each segment, so the marketing team sees not just 'these customers cluster together' but 'this cluster is worth $X per acquisition on a five-year horizon.'

  • Pricing?

    Scoped to data complexity and the deployment depth (one-time vs. integrated into the client's CRM). Discovery call covers both.

§ The next step

If the deliverable matches the gap, the next step is one call.

We'll scope length and price against your data and the operation it integrates with. No retainer, no fishing.

Bogdan and team · async-first · OP—2026