After you create a MaxCompute project, you must go to the project to perform subsequent operations, such as development, analysis, and O&M.
Platforms
You can execute the SETPROJECT statement on the MaxCompute client or ODPS SQL node of DataWorks.
Usage notes
A project is not a workspace. To obtain a project name, log on to the MaxCompute console and view the name in the MaxCompute Project Name column on the Project management tab.
You cannot execute statements to create or delete projects in MaxCompute. For more information about how to create a project, see Create a MaxCompute project.
View the properties of a project
Views the project-level properties of a project. MaxCompute also allows you to view the session-level properties of a project. For more information, see the "SHOW FLAGS" section in SET operations.
Syntax
setproject;
Properties
The following table describes common project properties.
Category
Scope of impact
Parameter
Description
Valid values
Affected SQL statements and functions
Ecosystem
Session
console.sql.result.instancetunnel
Specifies whether to enable InstanceTunnel. For more information about Tunnel commands, see Usage notes.
True: enabled.
False: disabled.
SELECT
Scheduling
Session
odps.stage.mapper.mem
The memory size of each Map worker.
Valid values: 256 MiB to 12288 MiB. Default value: 1024 MiB.
ALL
Scheduling
Session
odps.stage.reducer.mem
The memory size of each Reduce worker.
Valid values: 256 MiB to 12288 MiB. Default value: 1024 MiB.
ALL
Scheduling
Session
odps.stage.joiner.mem
The memory size of each Join worker.
Valid values: 256 MiB to 12288 MiB. Default value: 1024 MiB.
ALL
Scheduling
Session
odps.stage.mem
The total memory size of all workers in a specified MaxCompute job. This parameter has a lower priority than the odps.stage.mapper.mem, odps.stage.reducer.mem, and odps.stage.joiner.mem parameters.
Valid values: 256 MiB to 12288 MiB. No default value.
ALL
Scheduling
Session
odps.stage.mapper.split.size
The input data amount of each Map worker, which is the split size of the input file. You can use this parameter to indirectly control the number of workers at each Map stage.
NoteIf the LIMIT keyword is used in SQL statements, the number of concurrent workers is limited to 1. This way, only a single worker is allowed to run at a time. In this case, when you configure this parameter, we recommend that you do not add the LIMIT keyword to SQL statements.
Unit: MiB. Default value: 256 MiB.
ALL
Scheduling
Session
odps.sql.split.size
The size of a single shard of data for table-level parallel processing.
Unit: MiB.
For example,
{"table1": 1024, "table2": 512}
.ALL
Scheduling
Session
odps.sql.split.row.count
The number of rows in a single shard for table-level parallel processing.
NoteCan only be used for internal tables, non-transactional tables, and non-clustered tables.
Valid values: The value of this parameter must be a positive integer.
For example,
{"table1": 100, "table2": 500}
.ALL
Scheduling
Session
odps.sql.split.dop
The parallelism level for table-level processing.
NoteCan only be used for internal tables, non-transactional tables, and non-clustered tables.
Valid values: The value of this parameter must be a positive integer.
For example,
{"table1": 1, "table2": 5}
.ALL
Scheduling
Session
odps.stage.reducer.num
The number of workers at each Reduce stage.
NoteIf the LIMIT keyword is used in SQL statements, the number of concurrent workers is limited to 1. This way, only a single worker is allowed to run at a time. In this case, when you configure this parameter, we recommend that you do not add the LIMIT keyword to SQL statements.
-
ALL
Scheduling
Session
odps.stage.joiner.num
The number of workers at each Join stage.
NoteIf the LIMIT keyword is used in SQL statements, the number of concurrent workers is limited to 1. This way, only a single worker is allowed to run at a time. In this case, when you configure this parameter, we recommend that you do not add the LIMIT keyword to SQL statements.
-
ALL
Scheduling
Session
odps.stage.num
The total number of concurrent workers in a specified MaxCompute job. This parameter has a lower priority than the odps.stage.mapper.split.size, odps.stage.reducer.num, and odps.stage.joiner.num parameters.
NoteIf the LIMIT keyword is used in SQL statements, the number of concurrent workers is limited to 1. This way, only a single worker is allowed to run at a time. In this case, when you configure this parameter, we recommend that you do not add the LIMIT keyword to SQL statements.
-
ALL
Scheduling
Project
odps.instance.priority.enable
Specifies whether to enable the job priority feature for a project that uses subscription computing resources. For more information about the job priority feature, see Job priority.
True: enabled.
False: disabled.
NoteOnly the project owner or a user who is assigned the Super_Administrator role can run this command to enable the priority feature.
ALL
SQL
Session
odps.sql.reshuffle.dynamicpt
Specifies whether to enable dynamic partitioning. Dynamic partitioning prevents the generation of a large number of small files.
True: enabled.
False: disabled.
NoteIf a small number of dynamic partitions are generated, we recommend that you set this parameter to False to prevent data skew.
INSERT INTO
INSERT OVERWRITE
SQL
Session
odps.sql.udf.getjsonobj.new
Specifies whether the GET_JSON_OBJECT function retains the original string when it returns a value.
For MaxCompute projects created on or after January 21, 2021, the GET_JSON_OBJECT function retains the original strings for the returned results by default. For MaxCompute projects created before January 21, 2021, the GET_JSON_OBJECT function returns JSON reserved characters by using escape characters by default. This prevents the negative impact on existing jobs.
For more information about the GET_JSON_OBJECT function, see String functions.
True: enabled.
False: disabled.
UDF
SQL
Session
odps.sql.udf.jvm.memory
The maximum memory size of the Java Virtual Machine (JVM) heap for a user-defined function (UDF).
If a large amount of data is calculated and sorted in the memory by using a UDF, an out of memory (OOM) error may occur. In this case, you can increase the value of this parameter. However, you must optimize UDF code based on your business requirements to prevent OOM errors.
Valid values: 256 MiB to 12288 MiB. Default value: 1024 MiB.
UDF
SQL
Session
odps.function.timeout
The timeout period of a UDF.
Valid values: 0s to 3600s. Default value: 600s.
UDF
SQL
Session
odps.sql.session.resources
The resources that are referenced by a user-defined type (UDT). You can specify multiple resources and separate them with commas (,). For more information about referenced resources, see Overview.
Uploaded resources.
UDT
SQL
Session
odps.sql.udt.display.tostring
Specifies whether to enable the java.util.Objects.toString(...) mechanism on the Wrap for all the columns whose outputs are UDTs.
True: enabled.
False: disabled.
UDT
SQL
Session
odps.sql.session.java.imports
Java packages that are referenced by UDTs. You can specify multiple Java packages and separate them with commas (,). For more information about Java packages, see Overview.
Uploaded Java packages.
UDT
SQL
Session
odps.sql.skewjoin
Specifies whether to enable the SKEWJOIN feature to prevent long tails.
True: enabled.
False: disabled.
SELECT
JOIN
SQL
Session
odps.sql.skewinfo
The key and value on which SKEWJOIN is executed. For more information, see Data skew tuning.
-
SELECT
JOIN
SQL
Session
odps.sql.udf.ppr.deterministic
Specifies whether to enable partition pruning for UDFs. For more information about partition pruning, see WHERE clause (WHERE_condition).
True: enabled.
False: disabled.
UDF
SQL
Session
odps.sql.udf.ppr.to.subquery
Specifies whether to ignore the error that occurs during result backfilling when partition pruning is being performed. For more information about partition pruning, see the "WHERE clause (where_condition)" section in SELECT syntax.
True: enabled.
False: disabled.
UDF
SQL
Session
odps.optimizer.enable.range.partial.repartitioning
Specifies whether to enable the Shuffle Remove feature for range-clustered tables.
True: enabled.
False: disabled.
INSERT OVERWRITE
CREATE TABLE
SQL
Session
odps.optimizer.merge.partitioned.table
When a query repeatedly uses the same partitioned table, you can set this parameter to enable the system to merge the partitioned table, thereby minimizing IO operations on the partitioned table and improving performance.
True: enabled.
False: disabled.
SELECT
SQL
Session
odps.optimizer.skew.join.topk.num
The number of hot key values obtained by the optimizer when it performs an aggregate operation. For more information, see SKEWJOIN HINT.
-
SKEWJOIN HINT
SQL
Session
odps.optimizer.stat.collect.auto
Specifies whether to enable the Freeride feature. After the Freeride feature is enabled, statistical data of columns in a table is automatically collected. For more information about how to collect optimizer information, see Optimizer.
True: enabled.
False: disabled.
CREATE TABLE
INSERT INTO
INSERT OVERWRITE
SQL
Session
odps.optimizer.stat.collect.plan
A collection plan that is used to collect specific column stats metrics of specific columns. For more information about how to collect optimizer information, see Optimizer.
-
CREATE TABLE
INSERT INTO
INSERT OVERWRITE
SQL
Session
odps.sql.external.net.vpc
Specifies whether to enable virtual private cloud (VPC) for external tables. For more information, see
True: enabled.
False: disabled.
CREATE TABLE
SQL
Session
odps.sql.groupby.position.alias
Specifies whether to use integer constants in the GROUP BY clause as column numbers in a SELECT statement.
True: enabled.
False: disabled.
SELECT
GROUP BY
SQL
Session
odps.sql.groupby.skewindata
Specifies whether to enable the anti-skew feature for the GROUP BY clause.
True: enabled.
False: disabled.
GROUP BY
SQL
Session
odps.sql.orderby.position.alias
Specifies whether to use integer constants in the ORDER BY clause as column numbers in a SELECT statement.
True: enabled.
False: disabled.
ORDER BY
SELECT
SQL
Session
odps.sql.mapjoin.memory.max
The size of small tables that are read to the memory when the MAPJOIN statement is executed. Unit: MiB.
NoteThis parameter is also subject to the total memory global limit at the task level. For multiple small tables within the same task, the total memory global upper limit equals
std::min(8G, odps.sql.mapjoin.memory.max * the number of small tables in the same task)
.Valid values: 0 MiB to 8192 MiB.
JOIN
SQL
Session
odps.sql.distributed.map.join.memory.max
The memory size of each instance of the HashTableBuilder when the MAPJOIN statement is executed. The default value is 2048. Unit: MiB.
Valid values: 0 MiB to 8192 MiB.
JOIN
SQL
Session
odps.sql.python.version
The Python version on which SQL statements are executed.
cp27
cp37
UDTF
UDAF
SQL
Session
odps.sql.select.output.format
Specifies whether to display the table headers in the returned results of the MaxCompute client. For more information about how to use the MaxCompute client, see MaxCompute client (odpscmd).
{""needHeader"":false,""fieldDelim"":""""}
: The table headers are not displayed.{""needHeader"":true,""fieldDelim"":""""}
: The table headers are displayed.
ALL
SQL
Session
odps.sql.unstructured.data.oss.use.https
Specifies whether to enable the mechanism of obtaining data by using HTTPS at the underlying layer when you access external tables. For more information about Object Storage Service (OSS) external tables, see Create an OSS external table.
True: enabled.
False: disabled.
ALL
SQL
Session
odps.sql.decimal.tostring.trimzero
Specifies whether to remove the zero at the end of the decimal point when the data type is DECIMAL. The default value is
True
, which indicates to remove the zero at the end of the decimal point.True: The zero at the end of the decimal point is removed.
False: The zero at the end of the decimal point is remained.
CAST
SQL
Session
odps.sql.unstructured.tablestore.put.row
Specifies whether to call the PutRow operation to write data to a Tablestore external table.
NoteFor more information about PutRow, see PutRow.
True: enabled.
False: disabled.
ALL
SQL
Session
odps.sql.unstructured.external.max.dop
The maximum concurrency of the workers that are accessed by the external table.
The value of this parameter must be a positive integer.
Data queries and writes based on external tables
SQL
Session/Project
odps.sql.unstructured.file.pattern.black.list
In the scenario where MaxCompute SQL reads data from external tables pointing to locations in OSS or Hadoop Distributed File System (HDFS), it can automatically ignore the _SUCCESS file imported by Spark when writing to this directory. This prevents issues where MaxCompute SQL is unable to read the data within the directory due to this file. For example, the following code shows how to specify this behavior at the session level:
CREATE EXTERNAL TABLE IF NOT EXISTS test_oss_pt( t_id INT, t_string STRING) PARTITIONED BY (pt1 INT,pt2 INT) STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole' ) LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo3/'; SET odps.sql.unstructured.file.pattern.black.list=.*_SUCCESS$; INSERT INTO test_oss_pt PARTITION (pt1=8,pt2=8) VALUES (1,'tree'); SELECT t_id, t_string FROM test_oss_pt WHERE pt1=8 AND pt2=8;
Valid values:
.*_SUCCESS$
.Data queries based on the external tables
SQL
Session
odps.meta.exttable.stats.onlinecollect
Since the data for external queries is stored in a data lake, metadata processing cannot be performed locally due to considerations of openness. Additionally, because the data is located externally and statistical information has not been pre-collected, the query optimizer adopts a conservative strategy, leading to low query efficiency. The optimizer can discover small tables by temporarily collecting statistics on the tables during query execution. It then proactively uses Hash Join, optimizes the Join Order, reduces extensive shuffle operations, and shortens the execution pipeline, ultimately optimizing the query plan.
True: enabled.
False (default): disabled.
Data queries based on the external tables
SQL
Project
odps.sql.allow.fullscan
Specifies whether to enable a full table scan on a project. A full table scan occupies a large number of resources, which reduces data processing efficiency. Therefore, we recommend that you do not enable this feature.
True: A full table scan is enabled.
False: A full table scan is disabled.
SELECT
SQL
Project
odps.table.lifecycle
Specifies whether to configure a lifecycle for tables in a project.
Optional: The lifecycle clause is optional in a table creation statement. If you do not configure a lifecycle for a table, the table does not expire.
Mandatory: The lifecycle clause is required in a table creation statement.
Inherit: If you do not configure a lifecycle for a table when you create the table, the value of odps.table.lifecycle.value is used by default.
CREATE TABLE
SQL
Project
odps.table.lifecycle.value
The lifecycle of a table. Unit: days.
Valid values: 1 to 37231. Default value: 37231.
CREATE TABLE
SQL
Project
READ_TABLE_MAX_ROW
The maximum number of data records that can be returned by a SELECT statement.
Valid values: 1 to 10000. Default value: 10000.
SELECT
SQL
Project
odps.output.field.formatter
The dynamic data masking rules for SQL query results. For more information about dynamic data masking in MaxCompute, see Dynamic data masking.
Custom dynamic data masking rules.
SELECT
SQL
Project
odps.sql.acid.table.enable
Specifies whether to enable the atomicity, consistency, isolation, and durability (ACID) mechanism. For more information about ACID, see ACID semantics.
True: enabled.
False: disabled.
ALL
SQL
Session/Project
odps.io.oss.use.vipserver
Access to the Object Storage Service (OSS) can be performed by a VIPServer for external access. If this setting is not enabled, access to OSS will occur through a Server Load Balancer (SLB). The VIPServer method allows for higher traffic limits and is subject to flow control, but it also carries certain stability risks, such as query failures. You should evaluate these risks before deciding to use this option.
True: enabled.
False (default): disabled.
Data queries based on the external tables
SQL
Session/Project
odps.sql.executionengine.enable.string.to.date.full.format
Specifies whether to convert a date string that contains the hour, minute, and second parts. Default value: False.
True: enabled.
False: disabled.
CAST
SQL
Session/Project
odps.sql.executionengine.enable.rand.time.seed
If you set this parameter to False, the RAND function uses the instance ID as the seed for random number initialization. This ensures the idempotence of the function.
If you set this parameter to True, the RAND function uses the system time as the seed for random number initialization. In this case, the RAND function is no longer idempotent and cannot be used as a shuffle key. The return result varies each time you use the RAND function.
Default value: False.
True: enabled.
False: disabled.
RAND
SQL
Session/Project
odps.sql.type.system.odps2
Specifies whether to enable the MaxCompute V2.0 data type edition. For more information about data types supported in the MaxCompute V2.0 data type edition, see MaxCompute V2.0 data type edition.
True: enabled.
False: disabled.
Additional functions of MaxCompute V2.0
SQL
Session/Project
odps.sql.type.json.enable
Specifies whether to enable JSON data types. For more information about JSON data types, see Instructions for using JSON data types in MaxCompute.
True: enabled.
False: disabled.
MaxCompute JSON data types
SQL
Session/Project
odps.sql.hive.compatible
Specifies whether to enable the Hive-compatible data type edition. MaxCompute supports Hive syntaxes, such as inputRecordReader, outputRecordReader, and Serde, only after the Hive-compatible data type edition is enabled. For more information about the data types supported in the Hive-compatible data type edition, see Hive-compatible data type edition.
True: enabled.
False: disabled.
ALL
SQL
Session/Project
odps.sql.metering.value.max
The upper limit on resources that can be consumed by an SQL statement. For more information about consumption control, see Consumption control.
-
ALL
SQL
Session/Project
odps.sql.timezone
The time zone of the MaxCompute project that you accessed. For more information about time zones, see Time zone configuration operations.
-
ALL
SQL
Session/Project
odps.sql.unstructured.oss.commit.mode
Specifies whether to enable the multipart upload feature of OSS to write data to OSS external tables. For more information, see Write data to OSS.
True: enabled.
False: disabled.
INSERT OVERWRITE
SQL
Session/Project
odps.sql.groupby.orderby.position.alias
Specifies whether to use integer constants in the GROUP BY and ORDER BY clauses as column numbers in a SELECT statement.
NoteIf this parameter is set to True for an existing project, data parsing or other operations may fail to be performed. You must make sure that the original logic can be correctly executed for the existing project when you set this parameter to True. Otherwise, specify this parameter for sessions.
True: enabled.
False: disabled.
GROUP BY
ORDER BY
SELECT
SQL
Session/Project
odps.ext.oss.orc.native
Specifies whether to use a native Optimized Row Columnar (ORC) reader to read data from tables.
True: enabled.
False: disabled.
SELECT
SQL
Session
odps.sql.job.max.time.hours
The maximum duration in which a single job runs.
Valid values: 1 to 72. Default value: 24.
SQL jobs
Billing rules
Session
odps.task.quota.preference.tag
The quota group that is allocated to a specified job. You can configure the quota group on MaxCompute Management. For projects that use subscription resources, you can configure this parameter to specify a level-2 quota group that can be used by a job. A job is preferentially scheduled to a quota group whose quota tag is the same as the quota tag of the job. You must set a quota tag for a job when you submit the job. If no quota group has the same quota tag, the job is scheduled to the quota group specified by the project to which the job belongs.
You can execute the following statement to configure a quota tag:
set odps.task.quota.preference.tag = payasyougo
NoteThis parameter is available only for SQL jobs.
tag_name specifies the quota tag of the quota group that you configure on MaxCompute Management. You can configure tag_name only for quota groups in the region where the project of your job belongs. The value of tag_name can contain letters, digits, and underscores (_).
ALL
Billing rules
Session
odps.task.wlm.quota
The name of the quota that is used to run jobs in a session.
-
ALL
Security and permissions
Project
odps.forbid.fetch.result.by.bearertoken
Specifies whether to display the results of jobs on the Result tab of LogView. This parameter is used to protect data security.
True: The execution results of jobs are not displayed on the Result tab of LogView.
False: The execution results of jobs are displayed on the Result tab of LogView.
ALL
Security and permissions
Project
LabelSecurity
Specifies whether to enable LabelSecurity. For more information about LabelSecurity, see Label-based access control.
True: enabled.
False: disabled.
ALL
Security and permissions
Project
CheckPermissionUsingACL
Specifies whether to enable ACL-based access control. For more information about ACL-based access control, see MaxCompute permissions.
True: enabled.
False: disabled.
ALL
Security and permissions
Project
CheckPermissionUsingPolicy
Specifies whether to enable policy-based access control. For more information about policy-based access control, see Policy-based access control.
True: enabled.
False: disabled.
ALL
Security and permissions
Project
ObjectCreatorHasAccessPermission
Specifies whether creators of objects can perform operations on the objects created by themselves.
True: Creators of objects can perform operations on the objects that are created by themselves.
False: Creators of objects cannot perform operations on the objects created by themselves.
ALL
Security and permissions
Project
ObjectCreatorHasGrantPermission
Specifies whether creators of objects can authorize other users to perform operations on the objects.
True: Creators of objects can authorize other users to perform operations on the objects.
False: Creators of objects cannot authorize other users to perform operations on the objects.
ALL
Security and permissions
Project
ProjectProtection
Specifies whether to enable project data protection. For more information about project data protection, see Project data protection.
True: enabled.
False: disabled.
ALL
Security and permissions
Project
odps.output.field.formatter
The dynamic data masking rules for SQL query results. For more information about dynamic data masking, see Dynamic data masking.
Custom dynamic data masking rules.
SELECT
Permission security
Project
odps.security.ip.whitelist
The whitelist of IP addresses that are authorized to access the project over the cloud product interconnection network. For more information about IP address whitelists, see Manage IP address whitelists.
The list of IP addresses. Separate IP addresses with commas (,).
ALL
Permission security
Project
odps.security.vpc.whitelist
The whitelist of IP addresses that are authorized to access the project over a specific virtual private cloud (VPC). For more information about IP address whitelists, see Manage IP address whitelists.
RegionID_VPCID[IP Address]
ALL
Data type
Project
odps.sql.decimal.odps2
Specifies whether to enable DECIMAL(precision,scale) in the MaxCompute V2.0 data type edition. For more information about data types, see MaxCompute V2.0 data type edition.
True: enabled.
False: disabled.
ALL
Data storage
Project
odps.timemachine.retention.days
The retention period for backup data in the unit of days. For more information about data backup, see Backup and restoration.
Valid values: 0 to 30. Default value: 1.
ALL
Metadata
Project
odps.schema.evolution.enable
Specifies whether to enable schema evolution.
True: enabled.
False (default): disabled.
Schema evolution
Set project properties
Sets the project-level properties of a project. This statement takes effect within 5 minutes after it is executed. You can check the result 5 minutes after the statement is successfully executed. MaxCompute also allows you to set session-level properties for a project. For more information, see the "SET" section in SET operations.
Permission description
To set the project-level properties of a project, you must use the account of the project owner or the account that is assigned the project-level role Super_Administrator or Admin. For more information about how to assign roles, see Assign a built-in administrator role to a user.
Syntax
setproject <KEY>=<VALUE>;
Parameters
KEY: the name of the property.
VALUE: the value of the property. For more information about properties, see View project properties.
Examples
Enable a full table scan for a project.
setproject odps.sql.allow.fullscan=true;
Related statements
ADD ACCOUNTPROVIDER: Adds the RAM account system for a project.
USE: Goes to a specific project on which you have access permissions.
LIST ACCOUNTPROVIDERS: Views the information about the account systems that are supported by a project.
REMOVE ACCOUNTPROVIDER: Removes the RAM account system of a project.