The x13-auto-arima component uses the automatic selection program of the autoregressive integrated moving average (ARIMA) model. The component is based on Gomez and Maravall (1998) programs, which are edited in and after TRAMO (1996). This topic describes how to configure the x13_auto_arima component provided by Platform for AI (PAI).
Background information
Select x13_auto_arima based on the following process:
default model estimation
In the case of
frequency = 1
, the default model is(0,1,1)
.In the case of
frequency > 1
, the default model is(0,1,1)(0,1,1)
.identification of differencing orders
If you specify the diff and seasonalDiff parameters, skip this step.
Use
unit root tests
to determine the difference d and the seasonal difference D.identification of ARMA model orders
Select the most appropriate model based on the Bayesian information criterion (BIC). The maxOrder and maxSeasonalOrder parameters take effect in this step.
comparison of identified model with default model
Use the Ljung-Box Q statistic to compare the models. If both models are unacceptable, use the
(3,d,1)(0,D,1)
model.final model checks
For more information about Arima, see wiki. Algorithm usage notes:
Supported scale
Row: a maximum of 1,200 data records in a group
Column: one numeric column
Resource calculation method
Default calculation method if the groupColNames parameter is not specified:
coreNum=1 memSizePerCore=4096
Default calculation method if the groupColNames parameter is specified:
coreNum = floor(Total number of rows/120,000) memSizePerCore=4096
Limits
You can use the x13_auto_arima component based only on the computing resources of MaxCompute.
Configure the component
Method 1: Configure the component in the PAI console
You can configure the parameters of the x13_auto_arima component on the pipeline page of Machine Learning Designer. The following table describes the parameters.
Tab | Parameter | Description |
Fields Setting | Time Series Column | Required. This parameter is used only to sort values in the value column. |
Value Column | Required. | |
Stratification Column | Optional. You can separate multiple columns with commas (,), such as col0,col1. A time series is created for each group. | |
Parameters Setting | Start Date | The supported format is year.seasonal. Example: 1986.1. |
Series Frequency | The value must be a positive integer in the range of (0,12]. | |
Maximum of p and q | The value must be a positive integer in the range of (0,4]. | |
Maximum of Seasonal p and q | The value must be a number in the range of (0,2]. | |
Maximum of Difference d | The value must be a positive integer in the range of (0,2]. | |
Maximum of Seasonal Difference d | The value must be a positive integer in the range of (0,1]. | |
Difference d | The value must be a positive integer in the range of (0,2]. If both the diff and maxDiff parameters are specified, the maxDiff parameter does not take effect. The diff parameter must be used together with the seasonalDiff parameter. | |
Seasonal Difference d | The value must be a positive integer in the range of (0,1]. If both the seasonalDiff and maxSeasonalDiff parameters are specified, the maxSeasonalDiff parameter does not take effect. | |
predictNum | The number of predictions. For example, if you are using the daily sales from the last month to predict the sales for a new week, the number of predictions is 7. If Stratification Column is selected, each group will have 7 predictions. The value must be a positive integer in the range of (0,120]. | |
Predicted Confidence Interval | Default value: 0.95. | |
Tolerance | Optional. Default value: 1e-5. | |
Maximum Iterations | The value must be a positive integer. Default value: 1500. | |
Execution Tuning | Cores | The number of cores. By default, the system determines the value. |
Memory | The memory size per core. Unit: MB. |
Method 2: Configure the component by using PAI commands
You can use the CLI to configure the component parameters. You can use SQL scripts to call PAI commands. For more information, see SQL Script.
PAI -name x13_auto_arima
-project algo_public
-DinputTableName=pai_ft_x13_arima_input
-DseqColName=id
-DvalueColName=number
-Dstart=1949.1
-Dfrequency=12
-DpredictStep=12
-DoutputPredictTableName=pai_ft_x13_arima_out_predict2
-DoutputDetailTableName=pai_ft_x13_arima_out_detail2
Parameter | Required | Description | Default value |
inputTableName | Yes | The name of the input table. | N/A |
inputTablePartitions | No | The feature columns that are selected from the input table for model training. | Full table |
seqColName | Yes | The time series column. This parameter is used only to sort values in the value column. | N/A |
valueColName | Yes | The value column. | N/A |
groupColNames | No | The stratification columns. You can separate multiple columns with commas (,), such as col0,col1. A time series is created for each group. | N/A |
start | No | The start time of a time series. The value must be a string in the format of | 1.1 |
frequency | No | The time series frequency. The value must be a positive integer in the range of (0,12]. For more information, see Time series format. | 12 Note A value of 12 indicates 12 months (one year). |
maxOrder | No | The maximum values of p and q. The value must be a positive integer in the range of [0,4]. | 2 |
maxSeasonalOrder | No | The maximum values of seasonal p and q. The value must be a positive integer in the range of [0,2]. | 1 |
maxDiff | No | The maximum value of the difference d. The value must be a positive integer in the range of [0,2]. | 2 |
maxSeasonalDiff | No | The maximum value of the seasonal difference d. The value must be a positive integer in the range of [0,1]. | 1 |
diff | No | The difference d. The value must be a positive integer in the range of [0,2]. If both the diff and maxDiff parameters are specified, the maxDiff parameter does not take effect. The diff parameter must be used with the seasonalDiff parameter. | -1 Note A value of -1 indicates that the difference d is not specified. |
seasonalDiff | No | The seasonal difference d. The value must be a positive integer in the range of [0,1]. If both the seasonalDiff and maxSeasonalDiff parameters are specified, the maxSeasonalDiff parameter does not take effect. | -1 Note A value of -1 indicates that the seasonal difference d is not specified. |
maxiter | No | The maximum number of iterations. The value must be a positive integer. | 1500 |
tol | No | The tolerance. The value is of DOUBLE type. | 1e-5 |
predictStep | No | The number of prediction entries. The value must be a positive integer in the range of (0,365]. | 12 |
confidenceLevel | No | The confidence level. The value must be a number in the range of (0,1). | 0.95 |
outputPredictTableName | Yes | The output table. | N/A |
outputDetailTableName | Yes | The details table. | N/A |
outputTablePartition | No | Specifies whether to export data to a partition. | Does not export data to partitions |
coreNum | No | The number of cores. The value must be a positive integer. This parameter is used together with memSizePerCore. | Determined by the system |
memSizePerCore | No | The memory size of each core. Unit: MB. The value must be a positive integer in the range of [1024,64 × 1024]. | Determined by the system |
lifecycle | No | The lifecycle of the output table. | N/A |
Time series format
The start and frequency parameters specify the time dimensions ts1 and ts2 for the numeric column.
The frequency parameter specifies the data frequency within a unit period, which means the frequency of ts2 in each ts1.
The value of the start parameter is in the
n1.n2
format. This indicates that the start date is the n2th ts2 in the n1th ts1.
Time Unit | ts1 | ts2 | frequency | start |
12 months/year | Year | Month | 12 | 1949.2 indicates the second month of the year 1949. |
4 quarters/year | Year | Quarter | 4 | 1949.2 indicates the second quarter of the year 1949. |
7 days/week | Week | Day | 7 | 1949.2 indicates the second day of the 1949th week. |
1 | Any time unit | 1 | 1 | 1949.1 indicates the year 1949, or the 1949th day or hour. |
Example: value=[1,2,3,5,6,7,8,9,10,11,12,13,14,15]
start=1949.3,frequency=12
indicates that the unit time is 12 months per year, and the prediction starts from June of the year 1950.year
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
1949
1
2
3
4
5
6
7
8
9
10
1950
11
12
13
14
15
start=1949.3,frequency=4
indicates that the unit time is four quarters per year, and the prediction starts from the second quarter of the year 1953.year
Qtr1
Qtr2
Qtr3
Qtr4
1949
1
2
1950
3
4
5
6
1951
7
8
9
10
1952
11
12
13
14
1953
15
start=1949.3,frequency=7
indicates that the unit time is seven days per week, and the prediction starts from the fourth day of the 1951st week.week
Sun
Mon
Tue
Wed
Thu
Fri
Sat
1949
1
2
3
4
5
1950
6
7
8
9
10
11
12
1951
13
14
15
start=1949.1,frequency=1
indicates that the prediction starts in 1963.00.cycle
p1
1949
1
1950
2
1951
3
1952
4
1953
5
1954
6
1955
7
1956
8
1957
9
1958
10
1959
11
1960
12
1961
13
1962
14
1963
15
Examples
Prepare test data
This example uses the AirPassengers.csv dataset, which records the number of international airline passengers each month from the year 1949 to the year 1960. For more information about the dataset, see AirPassengers.
id | number |
1 | 112 |
2 | 118 |
3 | 132 |
4 | 129 |
5 | 121 |
... | ... |
Run the following Tunnel commands on the MaxCompute client to upload data. For information about how to install and configure the MaxCompute client, see MaxCompute client (odpscmd). For more information about Tunnel commands, see Tunnel commands.
create table pai_ft_x13_arima_input(id bigint,number bigint);
tunnel upload xxx/airpassengers.csv pai_ft_x13_arima_input -h true;
Run PAI commands
You can use the SQL script or ODPS SQL component to run the following PAI commands:
PAI -name x13_auto_arima
-project algo_public
-DinputTableName=pai_ft_x13_arima_input
-DseqColName=id
-DvalueColName=number
-Dstart=1949.1
-Dfrequency=12
-DmaxOrder=4
-DmaxSeasonalOrder=2
-DmaxDiff=2
-DmaxSeasonalDiff=1
-DpredictStep=12
-DoutputPredictTableName=pai_ft_x13_arima_auto_out_predict
-DoutputDetailTableName=pai_ft_x13_arima_auto_out_detail
Output description
Output table outputPredictTableName
Field description
column name
comment
pdate
The date of the prediction.
forecast
The prediction conclusion.
lower
The lower threshold of the prediction results when the confidence level is specified. The default confidence level is 0.95.
upper
The upper threshold of the prediction results when the confidence level is specified. The default confidence level is 0.95.
Generated data
Output table outputDetailTableName
Field description
column name
comment
key
model: the model in use.
evaluation: the evaluation result.
parameters: the training parameters.
log: the training logs.
summary
The storage details.
Generated data
FAQ
Why are prediction results the same?
If an exception occurs during model training, the mean model is called. In this case, all prediction results are the mean of the training data. Common exceptions include instability after temporal-difference learning, training without convergence, and variance 0. You can view the stderr file of individual nodes in Logview to obtain specific information about exceptions.
How do I configure the component parameters?
You need to configure the p, d, q, sp, sd, and sq parameters for the x13_arima component. If you are not confident with the parameter settings, we recommend that you use the x13_auto_arima component.
You need to only set the upper limits for the component. The component automatically tunes the parameters.
Error message:
ERROR: Number of observations after differencing and/or conditional AR estimation is 9, which is less than the minimum series length required for the model estimated, 24
The training data is insufficient. Modify the frequency parameter or add more training data.
Error message:
ERROR: Order of the MA operator is too large
In most cases, this error occurs because the training data is insufficient.
Error message:
ERROR: Series to be modelled and/or seasonally adjusted must have at least 3 complete years of data
If you have specified the seasonal parameters, at least three years of data is required.
References
X-13-ARIMA is a seasonally adjusted autoregressive integrated moving average (ARIMA) model algorithm based on the open source X-13ARIMA-SEATS algorithm. You can use the x13_arima component to process data. For more information, see x13_auto_arima.