Use the linear regression algorithm template in Machine Learning Designer to predict the repayment ability of agricultural loan applicants - Platform For AI

Linear regression is a common regression analysis method in mathematical statistics. You can use this method to find the quantitative relationships between two or more variables. Machine Learning Designer provides a preset linear regression template to help you build a model to predict the repayment ability of agricultural loan applicants based on historical loan records. This topic describes how to use the preset linear regression template.

Background information

Repayment ability prediction of agricultural loan applicants is a typical data mining process. Loan lenders can build an empirical model based on the historical data of applicants, such as annual incomes, crop types, and loan records, and use the model to predict the repayment ability of loan applicants.

Note

The datasets that are used in this topic are only for experimental use.

Prerequisites

A workspace is created. For more information, see Create a workspace.
MaxCompute resources are associated with the workspace. For more information, see Manage workspaces.

Datasets

The datasets that are used in this topic contains the following fields:

Field	Type	Description
id	STRING	The unique ID of the applicant.
name	STRING	The name of the applicant.
region	STRING	The geographic region where the applicant resides. Valid values: north, middle, and south.
farmsize	DOUBLE	The farmland size.
rainfall	DOUBLE	The rainfall in the region.
landquality	DOUBLE	The farmland quality. A greater value indicates better quality.
farmincome	DOUBLE	The annual income of the applicant.
maincrop	STRING	The crop type.
claimtype	STRING	The loan type.
claimvalue	DOUBLE	The loan amount.

Procedure

Go to the Machine Learning Designer page.
1. Log on to the PAI console.
2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
3. In the left-side navigation pane of the workspace page, choose Model Development and Training > Visual Modeling (Designer) to go to the Machine Learning Designer page.

Create a pipeline.

On the Visualized Modeling (Designer) page, click the Preset Templates tab.
On the Preset Templates tab, find the Agricultural Loan Prediction template and click Create.
In the Create Pipeline dialog box, configure the required parameters. You can use the default values.
The value of the Pipeline Data Path parameter indicates the Object Storage Service (OSS) path of the temporary data and models that are generated during the runtime of the pipeline.
Click OK.
It requires about 10 seconds to create the pipeline.
On the Pipelines tab, select the created pipeline and click Open.

View the components of the pipeline on the canvas. The following figure shows the pipeline that is automatically created based on the preset template.

Section	Description
1	The components in this section read the following datasets that are used in the pipeline: Training dataset: contains 100 historical records that are used to train the linear regression model. The dataset contains fields such as farmsize, rainfall, and claimvalue. The claimvalue field indicates the recovered loan amount. Prediction dataset: contains information about the 71 loan applicants who apply for agricultural loans this year. The claimvalue field indicates the requested loan amount. The pipeline predicts the repayment ability of the applicants in the prediction dataset based on the historical records in the training dataset.
2	The components in this section convert field values of the STRING type to the DOUBLE type. For example, the valid values of the region field are north, middle, and south. The components in this section map these values to numerical values (0, 1, and 2, respectively) and convert the numerical values to the DOUBLE type.
3	The linear regression component trains and generates a regression model by using historical records in the training dataset. The prediction component uses the regression model to predict the loan amount that applicants can repay. The Append Columns component merges the id, prediction_score, and claimvalue columns in the prediction results, as shown in the following figure. The prediction_score field indicates the predicted amount that the applicants can repay.
4	The Evaluation component evaluates the prediction performance of the model. For information about the evaluation metrics, see Table 1 (Evaluation metrics).
5	The Sql Mapping component identifies eligible loan applicants by comparing the predicted repayment amounts to the requested loan amounts. If the predicted repayment amount is higher than the requested loan amount, the applicant is considered as eligible.

Table 1. Evaluation metrics
Metric	Description
MAE	The mean absolute error.
MAPE	The mean absolute percentage error.
MSE	The mean squared error.
R	The coefficient of multiple correlations.
R2	The coefficient of determination.
RMSE	The root-mean-square error.
SAE	The sum of absolute errors.
SSE	The sum of squared errors.
SSR	The sum of squares due to regression.
SST	The total sum of squares.
count	The number of rows.
predictionMean	The mean of prediction results.
yMean	The mean of original dependent variables.

Run the pipeline and view the prediction results.
1. In the upper-left corner of the canvas, click the Run icon.
2. After the pipeline completes, right-click the Sql Mapping component on the canvas and choose View Data > Output Port. On the tab that appears, you can view the eligible loan applicants.

References

For more information about algorithm components, see the following topics: