Evaluation of Research Findings
Author: Carlos Del Carpio
On April 30th, an exciting new piece of research was published on SSRN - “On the Rise of Fintechs – Credit Scoring Using Digital Footprints.” It was as if we had commissioned it, as this piece helps to validate what our clients around the world already know: how someone behaves online is a reflection of their character, and can be used to measure risk and boost lending. We are grateful to the authors for their work as this paper provides a great insight into the predictive power that credit scoring models can achieve by using alternative data.
However, the methodology used to build and assess the models is as important as the data used as input. In this post we share a couple of observations about the methodology used in this paper based on over 10 years of experience building and deploying alternative data credit scoring models in production environments around the world.
#1 Using cross-validation alone to assess credit scoring models can inflate predictive power. We recommend to use out-of-time out-of-sample hold-outs to set more realistic performance expectations.
Credit scoring’s main objective is to predict credit repayment behavior in order to make credit risk decisions. In that sense, it is a forward looking predictive modeling exercise.
As in many applications of predictive modeling, in a credit scoring setting both the context and the behavior of the population being studied tends to evolve over time. The macroeconomic environment changes, the credit policies changes, the origination and collection processes change, and therefore the population itself is changing. All these changes combined often introduce bias and systematic differences between the training and validation sets used to build credit scoring models. Other economic and financial applications of predictive modeling face similar challenges. For example, if a model for predicting stock values is trained on data for a certain five-year period, it is unrealistic to treat the subsequent five-year period as a draw from the same population. In the same way, using cross-validation alone, where testing samples are drawn from the same time period as the training sample to validate the results of predictive models could lead to predictive power expectations that will differ greatly from the actual predictive performance that can be achieved.
To address this, using an out-of-time out-of-sample set of data that is representative of the most recent time period is preferred and can enable more realistic results. Since the model in this paper only uses out-of-sample cross validation, the results may be too optimistic compared with the actual results it may return when implemented.
#2 The paper omits feature selection, an important part of the model building process. This decision could lead to completely different results.
Dimensionality reduction methods, such as feature extraction (FE) and feature selection (FS) are important components of the building process of credit scoring models. However, depending on the classification technique used to estimate the functional form of the final model, FS can be done independently of model estimation, or it can be embedded in the process (i.e. built into the classifier construction occuring naturally as a part model estimation).
Logistic regression, for example, does not perform FS as part of its estimation. Therefore it has to be performed independently each time before the actual estimation. If FS is required to be done independently by the modeling procedure, this must be repeated each time for every training set, which in turn creates different optimal models with different sets of variables for each iteration of the cross-validation, which adds an additional step in order to choose the final set of variables to go into the final model.
This paper omits this problem completely by forcing all variables everytime without using any criteria for feature selection. In other words, the authors forcefully and arbitrarily “select” to include all the features, something that is rare and unrealistic in most credit scoring settings where there are hundreds or thousands of variables to choose from and are impossible to fit within a logistic regression model due to the curse of dimensionality. This is an intrinsic problem for credit scoring models that include big data sources such as digital footprint and social data, and the reason why feature extraction and feature selection methods can play a key role, sometimes as important as the techniques used to estimate the final functional forms. Had the authors included a feature selection process as part of each iteration in the cross validation process, it could have yield very different results.
In conclusion, putting these modeling observations aside, we consider this paper important because it offers a clear example of the signal that can be found in digital footprint data, and the possibilities to improve current methods and data sources. We just provide a few examples on how methodological choices can affect how results are estimated and assessed, and why are they important to assess the full potential of the predictive power in this particular type of data.
 Donoho DL. (2000). High-dimensional data analysis: The curses and blessings of dimensionality. AMS Math Challenges Lecture, 1, 32 pp.