All Products
Search
Document Center

Platform For AI:x13_arima

更新時間:Feb 20, 2024

X-13-ARIMA is a seasonally adjusted autoregressive integrated moving average (ARIMA) model algorithm based on the open source X-13ARIMA-SEATS algorithm. This topic describes how to configure the x13_arima component provided by Platform for AI (PAI).

Background information

ARIMA was proposed in the Box-Jenkins method by Box and Jenkins in the early 1970s for time series forecasting. ARIMA models are also known as Box-Jenkins models.

Algorithm description:

  • Supported scale

    • Row: a maximum of 1,200 data records in a group

    • Column: one numeric column

  • Resource calculation method

    • Default calculation method if the groupColNames parameter is not specified:

      coreNum=1
      memSizePerCore=4096
    • Default calculation method if the groupColNames parameter is specified:

      coreNum = floor(Total number of rows/120,000)
      memSizePerCore=4096

Configure the component

You can use one of the following methods to configure the x13_arima component:

Method 1: Configure the component in the PAI console

You can configure the parameters of the x13_arima component on the pipeline page of Machine Learning Designer. The following table describes the parameters.

Tab

Parameter

Description

Fields Setting

Time Series Column

Required. This parameter is used only to sort values in the numeric column.

Value Column

Required.

Stratification Column

Optional. You can separate multiple columns with commas (,), such as col0,col1. A time series is created for each group.

Parameters Setting

Format

The supported format is p,d,q. p, d, and q are non-negative integers in the range of [0,36].

  • p: autoregressive coefficient

  • d: difference

  • q: moving regression coefficient

Start Date

The supported format is year.seasonal. Example: 1986.1.

Series Frequency

The value must be a positive integer in the range of [1,12].

Format

The supported format is sp,sd,sq. sp, sd, and sq are non-negative integers in the range of [0,36].

  • sp: seasonal autoregressive coefficient

  • sd: seasonal difference

  • sq: seasonal moving regression coefficient

Seasonal Cycle

The value must be a number in the range of (0,12]. Default value: 12.

Prediction Entries

The value must be a positive integer in the range of (0,120]. Default value: 12.

Prediction Confidence Level

The value must be a number in the range of (0,1). Default value: 0.95.

Tuning

Cores

The number of cores. By default, the system determines the value.

Memory

The memory size per core. Unit: MB.

Method 2: Configure the component by using PAI commands

You can use SQL scripts to call PAI commands. For more information, see SQL Script.

PAI -name x13_arima
    -project algo_public
    -DinputTableName=pai_ft_x13_arima_input
    -DseqColName=id
    -DvalueColName=number
    -Dorder=3,1,1
    -Dstart=1949.1
    -Dfrequency=12
    -Dseasonal=0,1,1
    -Dperiod=12
    -DpredictStep=12
    -DoutputPredictTableName=pai_ft_x13_arima_out_predict
    -DoutputDetailTableName=pai_ft_x13_arima_out_detail

Parameter

Required

Description

Default value

inputTableName

Yes

The name of the input table.

N/A

inputTablePartitions

No

The feature columns that are selected from the input table for training.

Full table

seqColName

Yes

The time series column. This parameter is used only to sort values in the valueColName column.

N/A

valueColName

Yes

The numeric column.

N/A

groupColNames

No

The grouping columns. You can separate multiple columns with commas (,). Example: col0,col1. A time series is created for each group.

N/A

order

Yes

p indicates the autoregressive coefficient. d indicates the difference. q indicates the moving regression coefficient. The value must be a non-negative integer in the range of [0,36].

N/A

start

No

The start date of a time series. The value must be a string in the year.seasonal format, such as 1986.1. For more information, see Time series format.

1.1

frequency

No

The time series frequency. The value must be a positive integer in the range of (0,12]. For more information, see Time series format.

12

Note

A value of 12 indicates 12 months per year.

seasonal

No

sp indicates the seasonal autoregressive coefficient. sd indicates the seasonal difference. sq indicates the seasonal moving regression coefficient. The value must be a non-negative integer in the range of [0,36].

Non-seasonal

period

No

The seasonal cycle. The value must be a number in the range of (0,100].

frequency

maxiter

No

The maximum number of iterations. The value must be a positive integer.

1500

tol

No

The tolerance. The value is of the DOUBLE type.

1e-5

predictStep

No

The number of prediction entries. The value must be a positive integer in the range of (0,365].

12

confidenceLevel

No

The confidence level. The value must be a number in the range of (0,1).

0.95

outputPredictTableName

Yes

The prediction output table.

N/A

outputDetailTableName

Yes

The details table.

N/A

outputTablePartition

No

Specifies whether to export data to partitions.

Does not export data to partitions

coreNum

No

The number of cores. The value must be a positive integer. This parameter is used together with memSizePerCore.

Determined by the system

memSizePerCore

No

The memory size of each core. Unit: MB. The value must be a positive integer in the range of [1024,64 × 1024].

Determined by the system

lifecycle

No

The lifecycle of the output table.

N/A

Time series format

The start and frequency parameters specify the time dimensions ts1 and ts2 for the numeric column.

  • The frequency parameter specifies the data frequency within a unit period, which means the frequency of ts2 in each ts1.

  • The value of the start parameter is in the n1.n2 format. This indicates that the start date is the n2th ts2 in the n1th ts1.

Time Unit

ts1

ts2

frequency

start

12 months/year

Year

Month

12

1949.2 indicates the second month of the year 1949.

4 quarters/year

Year

Quarter

4

1949.2 indicates the second quarter of the year 1949.

7 days/week

Week

Day

7

1949.2 indicates the second day of the 1949th week.

1

Any time unit

1

1

1949.1 indicates the year 1949, or the 1949th day or hour.

Example: value=[1,2,3,5,6,7,8,9,10,11,12,13,14,15]

  • indicates that the unit time is 12 months per year, and the prediction starts from June of the year 1950.

    year

    Jan

    Feb

    Mar

    Apr

    May

    Jun

    Jul

    Aug

    Sep

    Oct

    1949

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    1950

    11

    12

    13

    14

    15

  • indicates that the unit time is four quarters per year, and the prediction starts from the second quarter of the year 1953.

    year

    Qtr1

    Qtr2

    Qtr3

    Qtr4

    1949

    1

    2

    1950

    3

    4

    5

    6

    1951

    7

    8

    9

    10

    1952

    11

    12

    13

    14

    1953

    14

    15

  • indicates that the unit time is seven days per week, and the prediction starts from the fourth day of the 1951st week.

    week

    Sun

    Mon

    Tue

    Wed

    Thu

    Fri

    Sat

    1949

    1

    2

    3

    4

    5

    1950

    6

    7

    8

    9

    10

    11

    12

    1951

    13

    14

    15

  • indicates that the prediction starts in 1963.00.

    cycle

    p1

    1949

    1

    1950

    2

    1951

    3

    1952

    4

    1953

    5

    1954

    6

    1955

    7

    1956

    8

    1957

    9

    1958

    10

    1959

    11

    1960

    12

    1961

    13

    1962

    14

    1963

    15

Example

Prepare test data

This example uses the AirPassengers.csv dataset, which records the number of international airline passengers each month from the year 1949 to the year 1960. For more information about the dataset, see AirPassengers.

id

number

1

112

2

118

3

132

4

129

5

121

...

...

Run the following Tunnel commands on the MaxCompute client to upload data. For information about how to install and configure the MaxCompute client, see MaxCompute client (odpscmd). For more information about Tunnel commands, see Tunnel commands.

create table pai_ft_x13_arima_input(id bigint,number bigint);
tunnel upload xxxx/airpassengers.csv pai_ft_x13_arima_input -h true;

Run PAI commands

You can use the SQL script or ODPS SQL component to run the following PAI commands:

PAI -name x13_arima
    -project algo_public
    -DinputTableName=pai_ft_x13_arima_input
    -DseqColName=id
    -DvalueColName=number
    -Dorder=3,1,1
    -Dseasonal=0,1,1
    -Dstart=1949.1
    -Dfrequency=12
    -Dperiod=12
    -DpredictStep=12
    -DoutputPredictTableName=pai_ft_x13_arima_out_predict
    -DoutputDetailTableName=pai_ft_x13_arima_out_detail

Output description

  • Output table outputPredictTableName

    • Field description

      column name

      comment

      pdate

      The date of the prediction.

      forecast

      The prediction conclusion.

      lower

      The lower threshold of the prediction results when the confidence level is specified. The default confidence level is 0.95.

      upper

      The upper threshold of the prediction results when the confidence level is specified. The default confidence level is 0.95.

    • The following figure shows the data.

      image

  • Output table outputDetailTableName

    • Field description

      column name

      comment

      key

      • model: the model in use.

      • evaluation: the evaluation result.

      • parameters: the training parameters.

      • log: the training logs.

      summary

      The storage details.

    • The following figure shows the data.

      image

FAQ

  • Why are prediction results the same?

    If an exception occurs during model training, the mean model is called. In this case, all prediction results are the mean of the training data. Common exceptions include instability after temporal-difference learning, training without convergence, and variance 0. You can view the stderr file of individual nodes in Logview to obtain specific information about exceptions.

  • How do I configure the component parameters?

    You need to configure the p, d, q, sp, sd, and sq parameters for the x13_arima component. If you are not confident with the parameter settings, we recommend that you use the x13_auto_arima component.

    You need to only set the upper limits for the component. The component automatically tunes the parameters.

  • Error message: ERROR: Number of observations after differencing and/or conditional AR estimation is 9, which is less than the minimum series length required for the model estimated, 24

    The training data is insufficient. Modify the frequency parameter or add more training data.

  • Error message: ERROR: Order of the MA operator is too large

    In most cases, this error occurs because the training data is insufficient.

  • Error message: ERROR: Series to be modelled and/or seasonally adjusted must have at least 3 complete years of data

    If you have specified the seasonal parameters, at least three years of data is required.

References

You need to configure parameters such as p, d, q, sp, sd, and sq for the x13_arima component. If you are not sure how to configure the parameters, we recommend that you use the x13_auto_arima component. You only need to specify the upper limit for the parameters for this component, and the system automatically specifies the optimal parameter values. For more information, see x13_auto_arima.