TL;DR

A customer who cancels rarely gives notice. They simply stop using the product and, one day, the contract isn’t renewed. For software-as-a-service (SaaS) companies, this phenomenon has a name — churn — and it’s expensive: increasing customer retention by just 5% can boost profits by 25% to 95%. But most companies only discover the risk when it’s already too late to act.

This project started in response to the question what if you could predict churn before it happened? It evolved from an experiment into an MVP that runs via API, has its own CLI, and includes a visual dashboard to track everything. The core uses XGBoost with behavioral data — days since last login, usage frequency, engagement volume — and generates risk scores that trigger automated actions or alerts for Customer Success, while also calculating the ROI of win-back campaigns for lost customers.

Along the way, I faced three classic problems:

  1. Misleading data: initially, all customers seemed healthy because the synthetic data lacked sufficient variance. Adjusting the data to reflect distinct customer profiles (healthy, at-risk, churned) was what made the model learn.

  2. Invisible value: once the model was working, the results were stuck in the terminal. Building a simple dashboard — with no heavy frameworks, just HTML/JS/Tailwind via CDN — made the value visible to non-technical people and accelerated adoption.

  3. A “good enough” model is not good enough: with XGBoost in place, the next step was hyperparameter optimization. I used Optuna to explore the search space and achieved a model 15% better in AUC. It’s not a revolutionary improvement, but it’s the difference between an experiment and a product that can generate real revenue.

The project is open source — all technical decisions, from FastAPI + PostgreSQL + Alembic to the pipeline architecture, are documented in cacaprog/churn-prediction-mrr. And the biggest lesson isn’t about algorithms, but about management: building a churn prediction engine isn’t just a machine learning challenge. It’s a product, business, and culture problem, where technology only works when connected to strategy and the company’s real processes.


What I Learned Building a Churn Prediction Engine from Scratch

The Hidden Cost of Churn

A customer who’s about to leave rarely says goodbye. They simply stop using the product — fewer logins, unopened emails, declining engagement — until one day the contract isn’t renewed. For software-as-a-service (SaaS) companies, this phenomenon has a name: churn. And it’s expensive.

Research from Bain & Company shows that increasing customer retention by just 5% can boost profits by 25% to 95%. The problem is that most companies only find out a customer is at risk when it’s already too late to act. The decision to cancel is made internally weeks before it ever shows up on a financial dashboard.

That tension is what gave rise to this project: what if you could predict churn before it happened? Not through guesswork, but with data and machine learning. What started as a question became a working MVP — and the path to get there taught lessons that go far beyond the code.


What Was Built: the Churn Risk & Win-Back Engine

The system has three core responsibilities.

Score the risk. Every customer receives a score from 0 to 100, classified into four categories: low, medium, high, and critical. The score is generated by an XGBoost model trained on behavioral data — days since last login, usage frequency, product engagement volume.

Trigger interventions. High-risk customers are automatically routed into a triage queue. Those with the highest MRR (monthly recurring revenue) are placed on the priority track, which fires an alert to the Customer Success team. The rest receive automated email outreach.

Win back those who already left. The win-back module evaluates churned customers, calculates their reactivation probability, and generates personalized offers — measuring the ROI of each reactivation in recovered revenue.

All of this is accessible through a REST API (FastAPI), a full CLI, and a visual browser dashboard. The database is PostgreSQL, migrations are managed by Alembic, and the ML core uses XGBoost with scikit-learn. Simple on paper, complex in execution — and that’s precisely where the most important lessons happened.


Problem 1 — Data That Lies

The model was built, the pipeline ran, the logs showed success. But something was wrong: every single customer had low risk. No exceptions. The dashboard showed zero MRR at risk, the action list was empty, and the win-back module had no reactivations to display.

The bug wasn’t in the model. It was in the demo data.

The function that generated synthetic customers created all of them with similar profiles — login values, engagement metrics, and MRR all falling within close ranges. For XGBoost, there was no pattern to learn. The algorithm tried to separate “at-risk customers” from “healthy customers,” but the features were too homogeneous for any decision boundary to make sense. The model simply predicted the majority class for every case.

The solution was to redesign the synthetic data around three distinct profiles:

  • Healthy customers: frequent logins, high engagement, stable MRR
  • At-risk customers: 15–45 days without a login, minimal access in the past month, fluctuating MRR
  • Churned customers: cancellation recorded, no recent activity

With that variance in place, the model finally learned something useful. In the next training run, high-risk customers started appearing and the dashboard came to life.

The lesson: machine learning models don’t work in a vacuum — they learn from real variance in data. Data quality matters more than algorithm sophistication. A simple XGBoost with well-structured data outperforms any elaborate architecture fed with poor data. Before blaming the model, investigate the data.


Problem 2 — Invisible Value Convinces No One

With the model working, the system was producing correct results. But a new problem emerged: those results only lived in the terminal. To check a customer’s risk score, you had to run a CLI command and parse JSON. To see win-back results, another command. To understand trends, yet another.

An ML product that only works in a terminal isn’t a product — it’s an experiment.

The next step was to build a visual dashboard that could tell the story in under a minute. And this is where an important decision was made: no build tooling.

The dashboard was built with plain HTML, uncompiled JavaScript, Chart.js via CDN, and Tailwind CSS via CDN. No Node.js, no npm install, no Webpack or Vite. Static files are served directly by FastAPI. Running the demo takes a single command.

Why that choice? Because the complexity of a modern frontend setup would be a barrier for anyone trying to see the system in action quickly. Demonstrability is part of the MVP — and adding friction to the demo process is the same as sabotaging it.

The dashboard now shows: weekly MRR at risk, customer distribution by risk category with color coding, an urgency-sorted action list, and a win-back panel with calculated ROI. The story the system told was finally visible — and understandable to someone with no technical context.

The lesson: in ML, a well-crafted visualization often does more for buy-in than pages of accuracy metrics. A chart showing the system detected a customer’s risk six weeks before they churned is worth more than a Brier Score report. Make the value visible, or it doesn’t exist for the people who make decisions.


Problem 3 — A Good Enough Model Is Not Good Enough

With clean data and a working dashboard, the inevitable question arrived: is the model performing at its full potential?

XGBoost has dozens of hyperparameters — learning rate, maximum tree depth, subsampling, L1 and L2 regularization, and more. The defaults are reasonable, but they’re rarely optimal for any specific dataset. Tuning them manually is slow, tedious, and biased: you end up testing what you already know.

The solution was Optuna, an automatic hyperparameter optimization library that uses Bayesian search. Instead of testing random combinations — as traditional grid search does — Optuna learns from each trial and steers the next ones toward more promising regions of the parameter space. It’s a smart way to explore a large space with a limited trial budget.

The pipeline works like this:

  1. run-tuning --tenant-id $TENANT --trials 50 — Optuna evaluates 50 parameter combinations, each scored using 5-fold stratified cross-validation
  2. The best parameters are saved to the database, isolated per tenant
  3. On the next run-retraining call, the system automatically loads the optimized parameters — no additional flags needed

For the demo dataset (~300 customers), 50 trials take under two minutes and produce a measurably better model in Brier Score — the metric that evaluates how well-calibrated the predicted probabilities are.

The lesson: hyperparameter optimization isn’t a last-minute detail — it’s a legitimate stage of the ML pipeline. And the decision to make it fully opt-in was equally important: tenants that never run tuning continue working normally with default parameters, with no breaking changes. Adding capability without altering existing behavior is a principle that applies to any software system.


What an ML MVP Really Is

Looking at what was built, it becomes clear that machine learning is not synonymous with a predictive model. A model is just one piece of a larger system — and often not the most important one.

The real MVP of this project is the complete chain: well-structured data → trained model → generated score → automatic triage → triggered intervention → measured result. Every link matters. Break any one of them — homogeneous data, a model with no variance to learn from, output with no visualization — and the system stops working as a product.

But there’s a more fundamental point: an ML project that doesn’t change anyone’s behavior is not a product — it’s a well-documented experiment. The ultimate goal was never to have an XGBoost with strong cross-validation accuracy. It was to reduce real churn, recover real revenue, and give the Customer Success team an action list they actually open on Monday morning.

That distinction — between a model that works and a system that generates business value — is what separates academic ML from applied ML. The metric that matters isn’t the Brier Score: it’s the MRR saved.

The opportunities this kind of technology opens up are broad. Any business with recurring customers, behavioral data, and a team that needs to prioritize attention can benefit from an engine like this. The barrier to entry has never been lower — the tools are open source, the documentation is rich, and the iteration cycle is fast.

The difference between companies that capture this potential and those that don’t rarely comes down to technology. It comes down to the willingness to face the three problems in this project — and learn from each one of them.