By Garvin Li
Scorecard is a common method used in the credit risk assessment and Internet financing industries. Scorecard doesn't simply correspond to a specific machine learning algorithm, but is a universal modeling framework. It divides the original data into bins, performs data feature engineering, and then applies data in linear models for modeling.
The scorecard modeling principle is applied in various credit assessment fields, such as credit card risk assessment and loan issuance. In addition, scorecard is often used for score assessment in scenarios such as customer service scoring and Zhima Credit scoring (Alipay credit scoring). This article uses a specific case to explain how to use the finance components of the Alibaba Cloud Machine Learning Platform for AI to establish a scorecard modeling scenario.
Click Load More to establish the scorecard experiment directly from a template, as shown in the following screenshot. This template contains the processes and data of the whole experiment.
The preceding screenshot shows an open source dataset from a foreign institution, with 30,000 pieces of data included. The dataset includes user properties such as gender, education level, marital status, and age as well as each user's credit card consumption records and bills over a past period of time. payment_next_month is the target queue, indicating whether a user repays the credit card bill (1 represents the bill has been repaid; 0 represents the bill has not been repaid).
The dataset can be downloaded from https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset
The following diagram shows the experiment process.
Split the input dataset into two parts: one part is used to train the model, and the other is used to perform evaluation prediction.
The binning component is similar to one-hot encoding and can map data according to its distribution into features with higher dimensions. Take the "age" field for example. The binning component can perform binning according to the data distribution in different intervals. The following screenshot shows the binning results.
The final output of the binning component is shown in the following screenshot. Each field is binned to multiple intervals.
PSI is an important metric to measure offsets due to sample changes. PSI is usually used to evaluate sample stability. For example, whether sample changes between two months are stable. In general, a variable PSI value lower than 0.1 indicates insignificant changes; a PSI value between 0.1 and 0.25 indicates significant changes; a PSI value greater than 0.25 indicates exceptionally significant changes that may require special attention.
In this case, by comparing PSI values before and after data splitting as well as PSI values of binning results, the PSI value of each feature is returned as shown in the following screenshot.
The results of scorecard training is shown in the following screenshot.
The essence of scorecard is the representation of complex model weights in the form of scores that meet the business standards.
The final scores of each prediction result (users' credit scores in this case).
Based on users' credit card consumption records, each user's final credit score is obtained by using scorecard model training and scorecard prediction. These final credit scores can be applied in credit investigation fields related to loans or finances.
Visit the Alibaba Cloud Machine Learning Platform for AI page to experience Alibaba Cloud's machine learning capabilities today!
Analyzing Census Data Using Alibaba Cloud's Machine Learning Platform
Ellen Cibula - January 18, 2023
Nick Patrocky - January 30, 2024
Alex - June 18, 2020
Alibaba Cloud Community - January 12, 2022
Alibaba Clouder - August 27, 2019
Alibaba Cloud Project Hub - November 15, 2021
A platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreThis technology can be used to predict the spread of COVID-19 and help decision makers evaluate the impact of various prevention and control measures on the development of the epidemic.
Learn MoreOffline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreA one-stop generative AI platform to build intelligent applications that understand your business, based on Qwen model series such as Qwen-Max and other popular models
Learn MoreMore Posts by GarvinLi