All Products
Search
Document Center

Platform For AI:x13_auto_arima

更新時間:Dec 18, 2024

The x13-auto-arima component uses the automatic selection program of the autoregressive integrated moving average (ARIMA) model. The component is based on Gomez and Maravall (1998) programs, which are edited in and after TRAMO (1996). This topic describes how to configure the x13_auto_arima component provided by Platform for AI (PAI).

Background information

Select x13_auto_arima based on the following process:

  • default model estimation

    In the case of frequency = 1, the default model is (0,1,1).

    In the case of frequency > 1, the default model is (0,1,1)(0,1,1).

  • identification of differencing orders

    If you specify the diff and seasonalDiff parameters, skip this step.

    Use unit root tests to determine the difference d and the seasonal difference D.

  • identification of ARMA model orders

    Select the most appropriate model based on the Bayesian information criterion (BIC). The maxOrder and maxSeasonalOrder parameters take effect in this step.

  • comparison of identified model with default model

    Use the Ljung-Box Q statistic to compare the models. If both models are unacceptable, use the (3,d,1)(0,D,1) model.

  • final model checks

For more information about Arima, see wiki. Algorithm usage notes:

  • Supported scale

    • Row: a maximum of 1,200 data records in a group

    • Column: one numeric column

  • Resource calculation method

    • Default calculation method if the groupColNames parameter is not specified:

      coreNum=1
      memSizePerCore=4096
    • Default calculation method if the groupColNames parameter is specified:

      coreNum = floor(Total number of rows/120,000)
      memSizePerCore=4096

Limits

You can use the x13_auto_arima component based only on the computing resources of MaxCompute.

Configure the component

Method 1: Configure the component in the PAI console

You can configure the parameters of the x13_auto_arima component on the pipeline page of Machine Learning Designer. The following table describes the parameters.

Tab

Parameter

Description

Fields Setting

Time Series Column

Required. This parameter is used only to sort values in the value column.

Value Column

Required.

Stratification Column

Optional. You can separate multiple columns with commas (,), such as col0,col1. A time series is created for each group.

Parameters Setting

Start Date

The supported format is year.seasonal. Example: 1986.1.

Series Frequency

The value must be a positive integer in the range of (0,12].

Maximum of p and q

The value must be a positive integer in the range of (0,4].

Maximum of Seasonal p and q

The value must be a number in the range of (0,2].

Maximum of Difference d

The value must be a positive integer in the range of (0,2].

Maximum of Seasonal Difference d

The value must be a positive integer in the range of (0,1].

Difference d

The value must be a positive integer in the range of (0,2].

If both the diff and maxDiff parameters are specified, the maxDiff parameter does not take effect.

The diff parameter must be used together with the seasonalDiff parameter.

Seasonal Difference d

The value must be a positive integer in the range of (0,1].

If both the seasonalDiff and maxSeasonalDiff parameters are specified, the maxSeasonalDiff parameter does not take effect.

predictNum

The number of predictions. For example, if you are using the daily sales from the last month to predict the sales for a new week, the number of predictions is 7. If Stratification Column is selected, each group will have 7 predictions.

The value must be a positive integer in the range of (0,120].

Predicted Confidence Interval

Default value: 0.95.

Tolerance

Optional. Default value: 1e-5.

Maximum Iterations

The value must be a positive integer. Default value: 1500.

Execution Tuning

Cores

The number of cores. By default, the system determines the value.

Memory

The memory size per core. Unit: MB.

Method 2: Configure the component by using PAI commands

You can use the CLI to configure the component parameters. You can use SQL scripts to call PAI commands. For more information, see SQL Script.

PAI -name x13_auto_arima
    -project algo_public
    -DinputTableName=pai_ft_x13_arima_input
    -DseqColName=id
    -DvalueColName=number
    -Dstart=1949.1
    -Dfrequency=12
    -DpredictStep=12
    -DoutputPredictTableName=pai_ft_x13_arima_out_predict2
    -DoutputDetailTableName=pai_ft_x13_arima_out_detail2

Parameter

Required

Description

Default value

inputTableName

Yes

The name of the input table.

N/A

inputTablePartitions

No

The feature columns that are selected from the input table for model training.

Full table

seqColName

Yes

The time series column. This parameter is used only to sort values in the value column.

N/A

valueColName

Yes

The value column.

N/A

groupColNames

No

The stratification columns. You can separate multiple columns with commas (,), such as col0,col1. A time series is created for each group.

N/A

start

No

The start time of a time series. The value must be a string in the format of year.seasonal, such as 1986.1. For more information, see Time series format.

1.1

frequency

No

The time series frequency. The value must be a positive integer in the range of (0,12]. For more information, see Time series format.

12

Note

A value of 12 indicates 12 months (one year).

maxOrder

No

The maximum values of p and q. The value must be a positive integer in the range of [0,4].

2

maxSeasonalOrder

No

The maximum values of seasonal p and q. The value must be a positive integer in the range of [0,2].

1

maxDiff

No

The maximum value of the difference d. The value must be a positive integer in the range of [0,2].

2

maxSeasonalDiff

No

The maximum value of the seasonal difference d. The value must be a positive integer in the range of [0,1].

1

diff

No

The difference d. The value must be a positive integer in the range of [0,2].

If both the diff and maxDiff parameters are specified, the maxDiff parameter does not take effect.

The diff parameter must be used with the seasonalDiff parameter.

-1

Note

A value of -1 indicates that the difference d is not specified.

seasonalDiff

No

The seasonal difference d. The value must be a positive integer in the range of [0,1].

If both the seasonalDiff and maxSeasonalDiff parameters are specified, the maxSeasonalDiff parameter does not take effect.

-1

Note

A value of -1 indicates that the seasonal difference d is not specified.

maxiter

No

The maximum number of iterations. The value must be a positive integer.

1500

tol

No

The tolerance. The value is of DOUBLE type.

1e-5

predictStep

No

The number of prediction entries. The value must be a positive integer in the range of (0,365].

12

confidenceLevel

No

The confidence level. The value must be a number in the range of (0,1).

0.95

outputPredictTableName

Yes

The output table.

N/A

outputDetailTableName

Yes

The details table.

N/A

outputTablePartition

No

Specifies whether to export data to a partition.

Does not export data to partitions

coreNum

No

The number of cores. The value must be a positive integer. This parameter is used together with memSizePerCore.

Determined by the system

memSizePerCore

No

The memory size of each core. Unit: MB. The value must be a positive integer in the range of [1024,64 × 1024].

Determined by the system

lifecycle

No

The lifecycle of the output table.

N/A

Time series format

The start and frequency parameters specify the time dimensions ts1 and ts2 for the numeric column.

  • The frequency parameter specifies the data frequency within a unit period, which means the frequency of ts2 in each ts1.

  • The value of the start parameter is in the n1.n2 format. This indicates that the start date is the n2th ts2 in the n1th ts1.

Time Unit

ts1

ts2

frequency

start

12 months/year

Year

Month

12

1949.2 indicates the second month of the year 1949.

4 quarters/year

Year

Quarter

4

1949.2 indicates the second quarter of the year 1949.

7 days/week

Week

Day

7

1949.2 indicates the second day of the 1949th week.

1

Any time unit

1

1

1949.1 indicates the year 1949, or the 1949th day or hour.

Example: value=[1,2,3,5,6,7,8,9,10,11,12,13,14,15]

  • start=1949.3,frequency=12 indicates that the unit time is 12 months per year, and the prediction starts from June of the year 1950.

    year

    Jan

    Feb

    Mar

    Apr

    May

    Jun

    Jul

    Aug

    Sep

    Oct

    Nov

    Dec

    1949

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    1950

    11

    12

    13

    14

    15

  • start=1949.3,frequency=4 indicates that the unit time is four quarters per year, and the prediction starts from the second quarter of the year 1953.

    year

    Qtr1

    Qtr2

    Qtr3

    Qtr4

    1949

    1

    2

    1950

    3

    4

    5

    6

    1951

    7

    8

    9

    10

    1952

    11

    12

    13

    14

    1953

    15

  • start=1949.3,frequency=7 indicates that the unit time is seven days per week, and the prediction starts from the fourth day of the 1951st week.

    week

    Sun

    Mon

    Tue

    Wed

    Thu

    Fri

    Sat

    1949

    1

    2

    3

    4

    5

    1950

    6

    7

    8

    9

    10

    11

    12

    1951

    13

    14

    15

  • start=1949.1,frequency=1 indicates that the prediction starts in 1963.00.

    cycle

    p1

    1949

    1

    1950

    2

    1951

    3

    1952

    4

    1953

    5

    1954

    6

    1955

    7

    1956

    8

    1957

    9

    1958

    10

    1959

    11

    1960

    12

    1961

    13

    1962

    14

    1963

    15

Examples

Prepare test data

This example uses the AirPassengers.csv dataset, which records the number of international airline passengers each month from the year 1949 to the year 1960. For more information about the dataset, see AirPassengers.

id

number

1

112

2

118

3

132

4

129

5

121

...

...

Run the following Tunnel commands on the MaxCompute client to upload data. For information about how to install and configure the MaxCompute client, see MaxCompute client (odpscmd). For more information about Tunnel commands, see Tunnel commands.

create table pai_ft_x13_arima_input(id bigint,number bigint);
tunnel upload xxx/airpassengers.csv pai_ft_x13_arima_input -h true;

Run PAI commands

You can use the SQL script or ODPS SQL component to run the following PAI commands:

PAI -name x13_auto_arima
    -project algo_public
    -DinputTableName=pai_ft_x13_arima_input
    -DseqColName=id
    -DvalueColName=number
    -Dstart=1949.1
    -Dfrequency=12
    -DmaxOrder=4
    -DmaxSeasonalOrder=2
    -DmaxDiff=2
    -DmaxSeasonalDiff=1
    -DpredictStep=12
    -DoutputPredictTableName=pai_ft_x13_arima_auto_out_predict
    -DoutputDetailTableName=pai_ft_x13_arima_auto_out_detail

Output description

  • Output table outputPredictTableName

    • Field description

      column name

      comment

      pdate

      The date of the prediction.

      forecast

      The prediction conclusion.

      lower

      The lower threshold of the prediction results when the confidence level is specified. The default confidence level is 0.95.

      upper

      The upper threshold of the prediction results when the confidence level is specified. The default confidence level is 0.95.

    • Generated data

      image

  • Output table outputDetailTableName

    • Field description

      column name

      comment

      key

      • model: the model in use.

      • evaluation: the evaluation result.

      • parameters: the training parameters.

      • log: the training logs.

      summary

      The storage details.

    • Generated data

      image

FAQ

  • Why are prediction results the same?

    If an exception occurs during model training, the mean model is called. In this case, all prediction results are the mean of the training data. Common exceptions include instability after temporal-difference learning, training without convergence, and variance 0. You can view the stderr file of individual nodes in Logview to obtain specific information about exceptions.

  • How do I configure the component parameters?

    You need to configure the p, d, q, sp, sd, and sq parameters for the x13_arima component. If you are not confident with the parameter settings, we recommend that you use the x13_auto_arima component.

    You need to only set the upper limits for the component. The component automatically tunes the parameters.

  • Error message: ERROR: Number of observations after differencing and/or conditional AR estimation is 9, which is less than the minimum series length required for the model estimated, 24

    The training data is insufficient. Modify the frequency parameter or add more training data.

  • Error message: ERROR: Order of the MA operator is too large

    In most cases, this error occurs because the training data is insufficient.

  • Error message: ERROR: Series to be modelled and/or seasonally adjusted must have at least 3 complete years of data

    If you have specified the seasonal parameters, at least three years of data is required.

References

X-13-ARIMA is a seasonally adjusted autoregressive integrated moving average (ARIMA) model algorithm based on the open source X-13ARIMA-SEATS algorithm. You can use the x13_arima component to process data. For more information, see x13_auto_arima.