Customer Churn Prediction
In fintech, churn is rarely a surprise in hindsight — it's always preventable in foresight. Wallet balance, failed top-up events, and relationship tenure outperform demographics entirely. Random Forest achieves AUC 0.876; recall is prioritized deliberately because missing an at-risk customer costs more than one wasted outreach.
Causal AI frameworks are the maturation path here — moving beyond who will churn to which intervention, at what cost, actually changes that outcome at the individual level.
Customer Churn
As of September 2021, a rough estimate for Customer Acquisition Cost (CAC) could range from $50 to $300 or more per customer. A general guideline for a good attrition rate for subscription-based online businesses was typically around 5% per month. However, it's essential to note that what constitutes a "good" attrition rate can differ significantly between industries, business model, engagement activities, competitive actions, pricing, value proposition, and service quality. Predicting and reducing customer churn or attrition is an interesting challenge.
Considerations around causality-related issues in predicting customer churn are key. Some of them are as follows:
- Correlation vs. Causation: Distinguishing between correlated factors and actual causes of churn.
- Confounding Variables: Identifying and accounting for variables that may distort causal relationships.
- Time Lag: Dealing with time delays between causes and the manifestation of churn.
- Reverse Causality: Recognizing situations where churn itself influences the identified causes.
- Hidden Causes: Discovering latent or unobservable factors contributing to churn.
- Feedback Loops: Handling situations where churn and its causes create feedback loops.
- Interventions: Understanding how interventions to reduce churn may impact causal relationships.
- Data Quality: Ensuring that data used to establish causal links is accurate and comprehensive.
At an aggregate level, churn/attrition % is an end-of-period fact. Customer churn rate is the percentage of customers who have gone from active to inactive during the analysis period. The denominator for the churn rate is the number of active customers at the beginning of the period, and the numerator is churn.
A predictive model establishes causality between seemingly disparate events — e.g., low balance in wallet, app errors while loading money during the observation period — and the probability that a customer may never return due to a degraded experience. A predictive model trained on historical data of causation predicts probability of churn in the prediction period. Assuming causation isn't broken, a churn model can be a valuable asset to design interventions to improve retention.
Introducing the Problem
The problem at hand is from the Fintech industry — that of a payments app used for paying in a hyperlocal market. While fingerprinting the consumers is easy, it is often hard to understand what leads to early adopter disengagement. Thanks to the subject nature of the problem there is a plethora of streaming data that can be captured in a big data system such as (but not limited to) app-related issues, wallet charging related issues, balance in wallet, first transaction time frame, and open marketing offers.
Overall, this problem looks at the history of consumer experiences and behavior, then builds a prediction to forecast future churn. Therefore, if low wallet balance in the past has led to zero transactions in more recent times, the model generalizes that current low balances increase the probability of churn in the future. Since the market is hyperlocal, consumer profiles such as age, gender, and purchase profiles are not great predictors of variability.
The goal is to build a machine learning classification model that can predict the probability of churn in current customers based on the behavior of attrition in the past — meaning the probability of "No future interaction."
Exploratory Analysis
Univariate visualization can show some interesting insights. For example, low balance (<= $5) shows a higher proportion of churn. Similarly, consumers with > 30 days of Length of Relationship tend to churn less.
The bivariate jointplot shows an ellipsoidal area with a large concentration of churns. This is perhaps due to the interaction of two bimodal distributions.
Predicting Churn
Three machine learning models — logistic regression, random forest, and XGBoost — for binary classification are developed with hyperparameter fitting using F1 score as the quality metric. The following are the ROC curves:
The following are the classification metrics:
| Scorer | Train F1 | Train AUC | Train Precision | Train Recall | Test F1 | Test AUC | Test Precision | Test Recall | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Logistic Regression | 0.729809 | 0.724550 | 0.716133 | 0.74405 | 0.203343 | 0.800579 | 0.118699 | 0.118699 |
| 1 | Random Forest | 0.990381 | 0.990283 | 0.980957 | 1.00000 | 0.556150 | 0.875905 | 0.619048 | 0.619048 |
| 2 | XGBoost | 0.985542 | 0.985325 | 0.971506 | 1.00000 | 0.553191 | 0.882587 | 0.492424 | 0.492424 |
The recall rate is key as it determines the ability to target customers with retention engagement. The model to be used is determined by factors such as budget. In this case the Random Forest model ought to be used.
Recall Rate
The recall rate clearly needs to be maximized in the current context. The following steps can be used as next steps:
- Data Augmentation: Increase the representation of the minority class by generating synthetic data points or using techniques like SMOTE (Synthetic Minority Over-sampling Technique).
- Feature Engineering: Carefully select and engineer features that highlight important patterns and characteristics of both classes, aiding the model's ability to distinguish between them.
- Algorithm Selection: Choose algorithms that are well-suited for imbalanced datasets, such as Random Forests, Gradient Boosting, or Support Vector Machines, as they can capture complex relationships better.
- Threshold Adjustment: Modify the classification threshold to prioritize recall over precision, particularly when false negatives are more costly than false positives.
- Ensemble Methods: Combine multiple models, such as bagging or boosting, to improve overall performance and recall by leveraging the strengths of different algorithms or subsets of the data.
