All Products
Search
Document Center

Platform For AI:Prophet

Last Updated:Feb 22, 2024

Prophet is an open source time series forecast algorithm provided by Facebook for data that has specific patterns. The Prophet component of Platform for AI (PAI) forecasts time series data for each row of MTable data and provides the prediction result for the subsequent time period. This topic describes how to configure the Prophet component.

Limits

You can use Prophet based only on the computing resources of MaxCompute.

Configure the component in the PAI console

  • Input ports

    Input port (left-to-right)

    Data type

    Recommended upstream component

    Required

    Input Data

    N/A

    Yes

  • Component parameters

    Tab

    Parameter

    Description

    Field Setting

    valueCol

    The data type is STRING, and the data format is MTable. You can use the MATBEL aggregation component to construct the data. You can use the data column of the Datetime type as the time series. Example of MTable: {"data":{"ds":["2019-05-07 00:00:00.0","2019-05-08 00:00:00.0"],"val":[8588.0,8521.0]},"schema":"ds TIMESTAMP,val DOUBLE"}.

    reservedCols

    The columns that are reserved for the algorithm.

    Parameter Setting

    predictionCol

    The name of the prediction result column.

    cap

    The upper limit of the predicted value.

    changepoint_prior_scale

    Default value: 0.05.

    change_point_range

    The proportion of trend change points. Default value: 0.8.

    changepoints

    The list of change points. Separate multiple change points with commas (,). Example: 2021-05-02,2021-05-07.

    daily_seasonality

    Specifies whether to fit seasonality by day. Default value: auto.

    floor

    The lower limit of the predicted value.

    growth

    The type of the trend. Valid values:

    • LINEAR (default value)

    • Logistic

    • Flat

    holidays

    Separate multiple holidays with spaces. Example: playoff:2021-05-03,2021-01-03 superbowl:2021-02-07,2021-11-02.

    holidays_prior_scale

    Holiday model parameters. Default value: 10.0.

    include_history

    Specifies whether to predict the value that corresponds to the date in the original data.

    interval_width

    The uncertainty interval. Default value: 0.8.

    mcmc_samples

    The number of samples used for Bayesian inference. The value of this parameter is an integer. If this parameter is set to 0, the maximum a posteriori probability (MAP) estimation is performed. The default estimate value is 100.

    n_change_point

    The maximum number of change points. Default value: 25.

    predictNum

    Valid values: (0, inf). Default value: 12.

    predictionDetailCol

    The name of the prediction details column.

    seasonality_mode

    The seasonality mode. Valid values:

    • ADDITIVE (default value)

    • MULTIPLICATIVE

    seasonality_prior_scale

    The parameter of the seasonality model. Default value: 10.0.

    stanInit

    The initial value. This parameter is empty by default.

    uncertaintySamples

    Default value: 1000. Samples are used to calculate statistical metrics. If you do not need to calculate statistical metrics and you want to accelerate the prediction, set this parameter to 0.

    weekly_seasonality

    Specifies whether to fit seasonality by week. Default value: auto.

    yearly_seasonality

    Specifies whether to fit seasonality by year. Default value: auto.

    numThreads

    The number of threads of the component.

    Execution Tuning

    Number of Workers

    The number of cores. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].

    Memory per worker, unit MB

    The memory size of each worker node. Valid values: 1024 to 64 × 1024. Unit: MB.

Configure the component by using code

You can copy the following code to the PyAlink Script component to allow the PyAlink Script component to function in the same manner as the Prophet component.

import time, datetime
import numpy as np
import pandas as pd

downloader = AlinkGlobalConfiguration.getPluginDownloader()
downloader.downloadPlugin('tf115_python_env_linux')

data = pd.DataFrame([
			[1,  datetime.datetime.fromtimestamp(1), 10.0],
			[1,  datetime.datetime.fromtimestamp(2), 11.0],
			[1,  datetime.datetime.fromtimestamp(3), 12.0],
			[1,  datetime.datetime.fromtimestamp(4), 13.0],
			[1,  datetime.datetime.fromtimestamp(5), 14.0],
			[1,  datetime.datetime.fromtimestamp(6), 15.0],
			[1,  datetime.datetime.fromtimestamp(7), 16.0],
			[1,  datetime.datetime.fromtimestamp(8), 17.0],
			[1,  datetime.datetime.fromtimestamp(9), 18.0],
			[1,  datetime.datetime.fromtimestamp(10), 19.0]
])

source = dataframeToOperator(data, schemaStr='id int, ts timestamp, val double', op_type='batch')

source.link(GroupByBatchOp()
			.setGroupByPredicate("id")
			.setSelectClause("id, mtable_agg(ts, val) as data")
		).link(ProphetBatchOp()
			.setValueCol("data")
			.setPredictNum(4)
			.setPredictionCol("pred")
		).link(FlattenMTableBatchOp()
					.setSelectedCol("pred_detail")
          .setSchemaStr("ds timestamp, yhat double")
    ).print()

References