PSD2 and Machine Learning: a marriage made in heaven
The Revised Payment Services Directive (PSD2) is an EU directive implemented in 2018 to drive greater transparency, security, innovation and market competition within the financial services industry. It enables bank customers to use third-party service providers to carry out various activities based on their financial data. This requires banks to provide access to customer account data and/or initiate payments to these providers. As a result, customers should benefit from access to new and innovative products – and experience improved service levels.
This paper describes a recent project undertaken by the authors, which has demonstrated – for a large European bank – that PSD2 data alone can be used to derive an extremely powerful predictor of consumer loan default, which can be used to make profitable lending decisions.
PSD2 will fundamentally change the payments value chain, the profitability of some retail banking business models, and customer expectations.
There are three main drivers for the PSD2 directive:
1.     Improving consumer rights and transparency
2.     Enhancing security through SCA (Strong Customer Authentication)
3.     Enabling consumers to easily share their account information with third-parties
With express agreement from customers, third parties will be able to access and use granular transaction information available on bank accounts. This will enable a greater level of tailoring of products and services to the customers’ specific circumstances; but the key to success is using powerful analytical tools to extract valuable insights from the data available.
Lending is a significant part of banks’ revenues, and an important use case for the application of PSD2 data. Our aim with this project was to understand how PSD2 will change lending and what can be done to maximise this opportunity.
Our research shows that the data available now through PSD2-compliant interfaces enables lenders to have a far greater understanding of the level of risk of each customer. This in turn enables lending decisions to be more precise, resulting in a more personalised experience for the customer, and higher relationship value for the bank. This increased accuracy provides a true win-win situation for both lenders and borrowers.
This improved modelling of retail credit risk requires a new and different approach, as it is based on different information than traditional models. Variables that banks currently use in their models are interpretable and easy to understand. For instance, data on credit history is often used:
Was the credit applicant ever late on a previous credit payment? If yes, how many times? How many consecutive months?
The available information varies from country to country, and credit history can encompass a variety of useful information – including mortgages, car loans, revolving loans and mobile phone and utility payments – but the reasoning built within the models is that the future behaviour, with respect to credit worthiness, is mostly determined by the credit history. Information on salary and some types of current account transactions are occasionally taken into account, but these variables are usually not the main drivers of the model, as they frequently require additional effort to extract (e.g. from paper statements).
PSD2 provides a significant shift in the data available for credit decisions. In particular, it provides a granular view of the applicant revenue and expenditure over time, in an electronic format. There are, however, 3 key challenges that need to be addressed.
Having access to ’raw’ current account transaction data over a period of time creates the first challenge: the creation of meaningful variables (or ’features’) that can be used for credit decisions. The data sets which will be available through PSD2 interfaces differ substantially from the ones used in traditional credit scoring. Traditional models use high-level features, such as credit history or average salary. They are high level, as they aggregate information from multiple transactions in a structured way. However, the main test when developing a credit scoring model based on PSD2-generated transactional data is that only low-level features, such as individual transactions, are available. This is comparable to the challenge that machine learning algorithms are now successfully addressing when trained to recognize the content of an image: their input consists of low-level features – in this case, values of pixels of an image, rather than a high-level description of shapes and colours. This is the approach we took during our project: basing the model on thousands of low-level inputs, rather than a moderate number of high-level features.
The second challenge is that – as things currently stand – there is no agreed standard across the EU on the data to be shared. For example, the UK has one with Open Banking, and the Berlin group has created the NextGenPSD2 API framework, but these have not been adopted across the Union yet. Even within some of these standards, such as the Berlin group framework, there is a large variability of possible interpretations. Many countries are still in the process of defining the actual technical standards that will be applicable in a specific market. This means that the credit scoring models built for any specific market need to be flexible enough to cope with multiple future scenarios with regards to the available information. Our approach here was to define different scenarios and train algorithms for each of them. This process can be scaled, by automating the entire machine learning procedure, from construction of composite features to actual training, validation and testing.
The third challenge is that there is typically lots of noise and irregularity in current account transactions over time, which makes it difficult to identify specific information such as salary. In addition, information extracted from current account transactions in the form of specific features or variables is extremely correlated. This is a significant test for traditional approaches to credit scoring modelling (more specifically for regression modelling) and requires new approaches to balance discriminatory power with stability over time.
Modern machine learning techniques offer a good solution to manage all of these challenges.
We have developed and thoroughly tested supervised machine learning models for behavioural credit scoring which predict defaults amongst retail customers based only on data available through PSD2 APIs.
The models are extremely predictive – obtaining Gini of 70% – thanks to a combination of the data we used and best-in-class machine learning technology. Furthermore, our team worked closely with the client’s data engineering team to ensure legal and regulatory compliance, computational performance and implementational best practices.
To develop our models, we worked with one of the leading banks in Central Europe, with a portfolio of several million customers. The models were developed on all consumer loan applications between May 2016 and May 2017. Six months of transactional history prior to the loan application (the observation window) was used to extract financial behaviour of the applicants. A one-year window after the application was used to detect if the customer defaulted or not (the outcome window). Our sample contained over 100,000 loan applications, and a bad rate over 7%, which meant we had ample number of observations to carry out detailed analyses – and also have meaningful hold-out samples for validation purposes.
To address the uncertainty with respect to which data will be available through PSD2, we developed multiple models, each using a different set of inputs. The base model only uses transactions from the observation window (only amount and date) and the current account balance at the time of application, because this information is guaranteed to be available through the PSD2-mandated interface. It does not include the amount of the authorised overdraft limit on the account. Gini of the base model is, on average, 70%.
Our best (still likely compatible with most PSD2-based data sources) model achieves Gini, on average, of 76%. This model includes information on any available current account overdraft at the time of application (likely to be available through the PSD interface), and basic demographic information (the address and the age).
All the analysis, data sampling, cross-validation and reject inference was done strictly and thoroughly to ensure that there was no bias introduced. We did multiple tests to ensure that the results were sound and valid. The results for modelled scenarios and sensitivity analysis of accuracy are shown in Figure 1.
Our best (but still realistic) model achieves Gini, on average, of 76%. This model includes information on any available current account overdraft at the time of application (likely to be available through the PSD interface), and basic demographic information (the address and the age).
All the analysis, data sampling, cross-validation and reject inference was done strictly and thoroughly to ensure that there was no bias introduced. We did multiple tests to ensure that the results were sound and valid. The results for 10 modelled scenarios are shown in Figure 1.
We used several machine learning algorithms to test which one will perform best. At first, we developed simple models such as linear, logistic and ridge regression. This was done to better understand the problem and to give us a solid benchmark on which to improve. These simpler models had Gini in the range of 55%-59%. From this relatively solid performance of simpler models, we concluded that the variables we created had captured the underlying nature of the customer. Furthermore, deep learning algorithms were tested, such as artificial neural networks and support vector machines. However, we decided to not use them, as they are significantly more vulnerable to correlation than tree-based algorithms.
The results illustrated that a highly optimized neural network was barely able to match even the most basic random forest models. There was no improvement in performance of neural networks even after we used dimensionality reduction techniques (such as PCA and Autoencoder techniques) to tackle the problem of high correlation. This is no surprise, because these algorithms usually perform better on unstructured data than on tabular data. In the end, ensembles of models proved to be best predictor of probability of default.
Another important issue to address was the identification of specific segments with distinct behaviours, given the large population we had at our disposal and the large number of connected features. This is the traditional features interaction problem, which, in traditional credit scoring, is illustrated by the well-known example of marital status vs. age, whereby the relationship between marital status and credit risk often changes with the age of the applicant. A more relevant example in our project was that volatility of spending was good predictor of risk, but only up to a certain level of income, beyond which it was no longer predictive. Traditional credit scoring manages this by segmenting the portfolio before building models; for example, building different models for different age groups, or different levels of income. The machine learning approach based on gradient boosting that we have used is considering the combined impact of multiple features in a systematic way, for every combination of features, for thousands of features – thus resulting in much better predictivity.
One of the key strengths of our approach was the methodology by which we extracted information from the transactional data to be fed into the machine learning models. We created approximately 3,000 composite features over different time periods and time spans within the observation window, which capture different trends and events. We used multiple mathematical and statistical techniques to generate these features, without discriminating with respect to their possible interpretability.
Features included the following examples:
- Various statistical measures of the data series of transactions in certain time periods (such as the last 4 weeks, or the last 6 months), including minima, maxima, averages, standard deviations, etc.;
- Advanced analysis of the time series, such as Fourier transformations;
- Features calculated by tailored secondary algorithms to estimate salary, stability of salary, if there had been missed credit payments, how many payments were missed (if any), and so on;
- Variables measuring frequency of certain events, such as payments in a certain range.
A good illustration of our approach to features design is how we estimated salary – and how we subsequently used it. We developed a relatively simple, low-level algorithm (based on a decision tree), which calculated the customer salary based on the analysis of credit transactions over time. Once we were satisfied that our salary calculator was accurate, we developed a new feature called salary stability, which is our interpretation of the output of the salary algorithm giving the confidence level of the salary estimate. The salary stability feature reflected oscillations in income. Perhaps not surprisingly, the salary stability feature was a more important feature in our models than the salary itself.
Another illustration is our approach to measuring spending patterns. The amount and type of spending was of course an important feature, but the trends and variability of spending over time provided additional information that proved to be a strong predictor of credit risk.
Such composite features were fed to a supervised machine learning model as variables to increase predictivity. We analysed several different scenarios, developed multiple different models on randomised data, and tested various machine learning algorithms to ensure that we achieved the best possible results.
Finally, we worked with the client to implement the developed credit scoring model within their own IT systems and processes, as shown in Figure 3.
Up until now, banks have had exclusive access to their customers’ data, but with PSD2, this is changing. Banks and financial institutions will have access to a full profile on both their customers (including accounts in other banks) and non-customers. As a result, they will be able to better personalise products to fit customer needs through modern technology. They could even offer products specialised for each customer individually.
We believe that to capture this opportunity, businesses need to use disruptive technologies such as machine learning – or more generally, data science – and utilise state-of-the-art algorithms to remain competitive in a rapidly-changing market.
We were able to create such predictive models for credit scoring using only the data available through PSD2. We showed that it is possible to extract spending habits and to understand overall financial behaviour from transactional data, ultimately calculating the probability of default. We also showed that a modern machine learning approach is significantly more predictive than the traditional approach (based on regression techniques), on the same data set, as shown in Figure 2.
Customers will benefit from the use of more predictive models as the risk of default can be more appropriately and accurately identified. More good-risk customers will be eligible for credit, but at the same time, the amount of defaults will decrease thanks to less customers finding themselves in financial difficulty.
For the banks, benefits come from not only the opportunity to increase market share, but from increased conversion rates from application to loan, decreased waiting times for approval, and lower costs of processing. This, along with access to better data enabling the bank to promptly react to other customer needs and expectations, will add to a better customer experience overall. Market dynamics and customer attitudes may favour banks that can capture such opportunities quickly and effectively.
To download the full article, click here. (PDF-611KB)
About the authors
Marc Gaudart is the founder and CEO of Trent Advisory Services. Marc was previously a Senior Vice-President at Experian Decision Analytics, and also a McKinsey consultant.
Sinisa Slijepcevic is the founder and CEO of Cantab Predictive Intelligence. He was previously a McKinsey consultant, and has a PhD in Applied Mathematics, University of Cambridge.
Domjan Baric and Toni Vlaic are Associates in Cantab Predictive Intelligence.