Linear regression is a common regression analysis method in mathematical statistics. You can use this method to find the quantitative relationships between two or more variables. Machine Learning Designer provides a preset linear regression template to help you build a model to predict the repayment ability of agricultural loan applicants based on historical loan records. This topic describes how to use the preset linear regression template.
Background information
Repayment ability prediction of agricultural loan applicants is a typical data mining process. Loan lenders can build an empirical model based on the historical data of applicants, such as annual incomes, crop types, and loan records, and use the model to predict the repayment ability of loan applicants.
The datasets that are used in this topic are only for experimental use.
Prerequisites
A workspace is created. For more information, see Create a workspace.
MaxCompute resources are associated with the workspace. For more information, see Manage workspaces.
Datasets
The datasets that are used in this topic contains the following fields:
Field | Type | Description |
id | STRING | The unique ID of the applicant. |
name | STRING | The name of the applicant. |
region | STRING | The geographic region where the applicant resides. Valid values: north, middle, and south. |
farmsize | DOUBLE | The farmland size. |
rainfall | DOUBLE | The rainfall in the region. |
landquality | DOUBLE | The farmland quality. A greater value indicates better quality. |
farmincome | DOUBLE | The annual income of the applicant. |
maincrop | STRING | The crop type. |
claimtype | STRING | The loan type. |
claimvalue | DOUBLE | The loan amount. |
Procedure
Go to the Machine Learning Designer page.
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane of the workspace page, choose to go to the Machine Learning Designer page.
Create a pipeline.
On the Visualized Modeling (Designer) page, click the Preset Templates tab.
On the Preset Templates tab, find the Agricultural Loan Prediction template and click Create.
In the Create Pipeline dialog box, configure the required parameters. You can use the default values.
The value of the Pipeline Data Path parameter indicates the Object Storage Service (OSS) path of the temporary data and models that are generated during the runtime of the pipeline.
Click OK.
It requires about 10 seconds to create the pipeline.
On the Pipelines tab, select the created pipeline and click Open.
View the components of the pipeline on the canvas. The following figure shows the pipeline that is automatically created based on the preset template.
Section
Description
1
The components in this section read the following datasets that are used in the pipeline:
Training dataset: contains 100 historical records that are used to train the linear regression model. The dataset contains fields such as farmsize, rainfall, and claimvalue. The claimvalue field indicates the recovered loan amount.
Prediction dataset: contains information about the 71 loan applicants who apply for agricultural loans this year. The claimvalue field indicates the requested loan amount.
The pipeline predicts the repayment ability of the applicants in the prediction dataset based on the historical records in the training dataset.
2
The components in this section convert field values of the STRING type to the DOUBLE type. For example, the valid values of the region field are north, middle, and south. The components in this section map these values to numerical values (0, 1, and 2, respectively) and convert the numerical values to the DOUBLE type.
3
The linear regression component trains and generates a regression model by using historical records in the training dataset. The prediction component uses the regression model to predict the loan amount that applicants can repay. The Append Columns component merges the id, prediction_score, and claimvalue columns in the prediction results, as shown in the following figure. The prediction_score field indicates the predicted amount that the applicants can repay.
4
The Evaluation component evaluates the prediction performance of the model. For information about the evaluation metrics, see Table 1 (Evaluation metrics).
5
The Sql Mapping component identifies eligible loan applicants by comparing the predicted repayment amounts to the requested loan amounts. If the predicted repayment amount is higher than the requested loan amount, the applicant is considered as eligible.
Table 1. Evaluation metrics Metric
Description
MAE
The mean absolute error.
MAPE
The mean absolute percentage error.
MSE
The mean squared error.
R
The coefficient of multiple correlations.
R2
The coefficient of determination.
RMSE
The root-mean-square error.
SAE
The sum of absolute errors.
SSE
The sum of squared errors.
SSR
The sum of squares due to regression.
SST
The total sum of squares.
count
The number of rows.
predictionMean
The mean of prediction results.
yMean
The mean of original dependent variables.
Run the pipeline and view the prediction results.
In the upper-left corner of the canvas, click the Run icon.
After the pipeline completes, right-click the Sql Mapping component on the canvas and choose . On the tab that appears, you can view the eligible loan applicants.
References
For more information about algorithm components, see the following topics: