This article uses middle school students' data and machine mining algorithms to determine the key factors affecting middle school students' academics. This includes information such as parents' occupation, parents' education, and Internet connectivity at home. The offline models and the academic indicator evaluation report are generated through the logistic regression algorithm to predict the students' final examination. An online prediction API is generated, through which the trained offline model is applied to the online scenario.
We will be building our predictor using the Alibaba Cloud Machine Learning Platform for Artificial Intelligence (PAI) service.
The dataset consists of 25 feature columns and 1 target column. The detailed fields are as follows.
The following is a screenshot of the data.
The following diagram shows the experiment process.
The data flows through the experiment from top to bottom, for preprocessing, splitting, training, prediction and evaluation in turn.
The SQL script is provided as follows.
1. select (case sex when 'F' then 1 else 0 end) as sex,
2. (case address when 'U' then 1 else 0 end) as address,
3. (case famsize when 'LE3' then 1 else 0 end) as famsize,
4. (case Pstatus when 'T' then 1 else 0 end) as Pstatus,
5. Medu,
6. Fedu,
7. (case Mjob when 'teacher' then 1 else 0 end) as Mjob,
8. (case Fjob when 'teacher' then 1 else 0 end) as Fjob,
9. (case guardian when 'mother' then 0 when 'father' then 1 else 2 end) as guardian,
10. traveltime,
11. studytime,
12. failures,
13. (case schoolsup when 'yes' then 1 else 0 end) as schoolsup,
14. (case fumsup when 'yes' then 1 else 0 end) as fumsup,
15. (case paid when 'yes' then 1 else 0 end) as paid,
16. (case activities when 'yes' then 1 else 0 end) as activities,
17. (case higher when 'yes' then 1 else 0 end) as higher,
18. (case internet when 'yes' then 1 else 0 end) as internet,
19. famrel,
20. freetime,
21. goout,
22. Dalc,
23. Walc,
24. health,
25. absences,
26. (case when G3>14 then 1 else 0 end) as finalScore
27. from ${t1};
Structure text data using the SQL script component.
The purpose of the normalization component is to remove the dimension and transform all the fields to 0 and 1, which eliminates the impact of the imbalance between the fields. The result is shown in the figure below.
The data set is split in a ratio of 8:2, in which 80% is used for model training, and 20% is used for prediction.
The offline model is generated by training through a logistic regression algorithm. If you are new to this algorithm, you can read more about logistic regression on Wikipedia.
View the accuracy of model predictions through the confusion matrix. As can be seen from the figure below, the prediction accuracy of this experiment is 82.911%.
According to the characteristics of the logistic regression algorithm, some valuable information can be mined through the model coefficients. Right click on the Binary Logistic Regression component to view the model. The results are shown below.
According to the characteristics of the logistic regression algorithm, the greater the weight, the greater the impact of the feature on the result. A positive weight indicates a positive correlation to the result 1 (high score in final exam), and a negative weight indicates a negative correlation. Several features with large weights are analyzed in the following table.
Due to the small dataset in this experiment, the above analysis results are not necessarily accurate and are for reference only.
Once generated, the offline model can be deployed online and the online prediction function can be implemented by calling restful-api.
To learn more about Alibaba Cloud Machine Learning Platform for Artificial Intelligence (PAI), visit www.alibabacloud.com/product/machine-learning
Alibaba Cloud Machine Learning Platform for AI: Heart Disease Prediction
Alibaba Clouder - October 15, 2020
Alibaba Cloud_Academy - September 1, 2023
Alibaba Clouder - April 20, 2020
Alibaba Cloud_Academy - February 16, 2022
Alibaba Cloud_Academy - October 11, 2023
GarvinLi - January 18, 2019
A platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreThis technology can be used to predict the spread of COVID-19 and help decision makers evaluate the impact of various prevention and control measures on the development of the epidemic.
Learn MoreMore Posts by GarvinLi