Sequence - MaxCompute - Alibaba Cloud Documentation Center

SequenceExpr represents a column in a two-dimensional dataset. You cannot manually create a SequenceExpr object. You can only retrieve one from a collection object.

Prerequisites

Before you retrieve a column, make sure the following requirements are met:

A sample table named pyodps_iris is prepared. For more information, see DataFrame data processing.
A DataFrame object is created. For more information, see Create a DataFrame object.

Retrieve a column

Use collection.column_name to retrieve a column. Sample command:

print(iris.sepallength.head(5))

The following result is returned:

   sepallength
0          4.9
1          4.7
2          4.6
3          5.0
4          5.4

If the name of a column is stored in a string variable, use df[column_name] to retrieve the column. Sample command:
```
print(iris['sepallength'].head(5))
```
The following result is returned:
```
   sepallength
0          4.9
1          4.7
2          4.6
3          5.0
4          5.4
```

Column types

DataFrame has its own type system. When a table is initialized, data types of MaxCompute are converted into the related data types of DataFrame. This way, more types of computing backends are supported. The DataFrame execution backend supports MaxCompute SQL, Pandas, and databases, such as MySQL and Postgres.

The following table describes the mappings between data types of DataFrame and data types of MaxCompute.

Data type of MaxCompute	Data type of DataFrame
BIGINT	INT64
DOUBLE	FLOAT64
STRING	STRING
DATETIME	DATETIME
BOOLEAN	BOOLEAN
DECIMAL	DECIMAL
ARRAY<VALUE_TYPE>	LIST<VALUE_TYPE>
MAP<KEY_TYPE, VALUE_TYPE>	DICT<KEY_TYPE, VALUE_TYPE>
If you specify `options.sql.use_odps2_extension=True`, the mappings between the following data types are also supported.
TINYINT	INT8
SMALLINT	INT16
INT	INT32
FLOAT	FLOAT32

Take note of the following points:

If the fields in a table are of the LIST and DICT types, you must specify the types of elements in the table. If you do not specify the types of the elements, an error occurs.
DataFrame does not support the TIMESTAMP and STRUCT types that are introduced in MaxCompute V2.0.
You can obtain the data type by using sequence.dtype in a sequence object. Sample code:
```
print(iris.sepallength.dtype)
```
The following result is returned:
```
FLOAT64
```
You can use the astype method to change the type of a column. If you use this method, a type is required as the input value and the converted sequence object is returned. Sample code:
```
print(iris.sepallength.astype('int').head(5))
```
The following result is returned:
```
   sepallength
0            4
1            4
2            4
3            5
4            5
```

Column names

In DataFrame computing, a sequence object must have a column name. In most cases, DataFrame automatically creates a name for each sequence object. Sample code:
```
print(iris.groupby('name').sepalwidth.max().head(5))
```
The following result is returned:
```
   sepalwidth_max
0             4.4
1             3.4
2             3.8
```
Note
In the preceding example, sepalwidth is named sepalwidth_max after the maximum value of sepalwidth is obtained. For example, if you add a scalar field to a specified sequence object, the returned result is automatically named as the name of the sequence object. In other cases, you need to manually name a sequence object.

A sequence object provides the rename method to rename a column. Sample code:

print(iris.sepalwidth.rename('sepal_width').head(5))

The following result is returned:

   sepal_width
0          3.0
1          3.2
2          3.1
3          3.6
4          3.9

Column calculations

You can perform operations on a sequence to obtain a new sequence. This operation is similar to the calculation of simple Python variables. For numeric columns, arithmetic operations are supported. For string columns, only string concatenations are supported. Sample code:
```
print((iris.sepallength + 5).head(5))
```
The following result is returned:
```
   sepallength
0          9.9
1          9.7
2          9.6
3         10.0
4         10.4
```
If two columns are used for calculations, PyODPS cannot determine the column name that is displayed. You must manually specify the column name. Sample code:
```
   sum_sepal
0        7.9
1        7.9
2        7.7
3        8.6
4        9.3
```
Note
For more information about column calculations, see Column operations.