All Products
Search
Document Center

Data Management:Use Notebook to query and analyze data

Last Updated:Sep 02, 2024

This topic describes how to use Jupyter Notebook provided by Data Management (DMS) to query and analyze data. A notebook is an interactive page that integrates code, text, and charts. You can use a notebook to easily share information with other users.

Notebook UI

image

  • image: saves the content that you edit in the notebook.

  • image: inserts a cell above the current cell.

  • image: deletes the selected cell.

  • image: cuts the selected cell.

  • image: copies the selected cell.

  • image: pastes the copied content to the selected cell.

  • image: executes the content of the selected cell.

  • image: stops the kernel.

  • image: restarts the kernel.

  • image: restarts the kernel and runs the notebook again.

  • image: changes the attribute of the selected cell. Optional attributes include Code, SQL, Markdown, and Raw.

  • image: opens the Copilot pane for conversational data queries or analysis.

Features supported by DMS Jupyter Notebook

Compatible with open source Jupyter Notebook, DMS Jupyter Notebook provides enhanced capabilities for SQL queries and visualizations. This section describes the features supported by DMS Jupyter Notebook.

IPython kernel

Use pip to install extension packages. You can access a notebook that uses this kernel over the Internet, and use the notebook in a way basically the same as you use open source Jupyter Notebook.

  • You can use Spark syntax to query table data in a notebook. Sample statements:

    • Syntax 1:

      df = spark.sql("select * from customer limit 10").show();
    • Syntax 2:

      %%spark_sql 
      select * from customer limit 10;
    • Syntax 3:

      In a cell, choose SQL > Spark SQL and enter an SQL statement.

      image

      CREATE TABLE IF NOT EXISTS 'default'.'select_2' AS SELECT 2;
      Note

      By default, a Spark SQL cell displays up to 3,000 rows on a page. To adjust the number of rows displayed, modify the environment variable DMS_SPARK_SQL_DEFAULT_LIMIT by entering the following code in the cell:

      os.environ['DMS_SPARK_SQL_DEFAULT_LIMIT'] = '3000';
  • In the toolbar of the notebook, you can change the cell attribute from Code to SQL.

    SQL cells use the same syntax as that for logical data warehouses. You can query data across databases and analyze data in real time in SQL cells. The permissions that are required are the same as the fine-grained permissions in DMS.

  • In SQL cells and Python cells, you can reference variables in the format of ${Variable name}, customize the variable name, and view the variable type. The following section describes how to reference and generate variables in a notebook.

    • Reference a variable

      You can reference an IPython variable in an SQL statement in the ${IPython variable name} format.

      image

    • Generate a variable

      You can directly reference the result set of an SQL cell as a variable in IPython. The variable name is displayed in the lower-left corner of the result set, and the variable type is pandas.core.frame.DataFrame. You can click the variable name to customize the name.

      image

  • You can visualize the result set of an SQL query in a table or a chart by clicking an icon.

PySpark kernel

By default, DMS Jupyter Notebook uses AnalyticDB Spark. You can also use open source Spark in a notebook.

Spark Magic allows you to run the magic command %%help to view the supported commands.

Note

Spark Magic is an extension of Jupyter Notebook.

Use AnalyticDB Spark

After you purchase an AnalyticDB cluster of Data Lakehouse Edition, create a resource group, and create a database account, you can use the commands described in the following table.

Command

Description

%%info

Views the AnalyticDB Spark configurations.

%%sql

Submits an SQL statement to AnalyticDB Spark.

Creates, reads, and writes C-Store tables. For more information, see Use Spark SQL to read and write C-Store tables.

%%spark

Submits Python code to AnalyticDB Spark.

Note

AnalyticDB Spark retains a new session for 20 minutes and deletes the session after it expires.

Upload and download files

You can use ossutil to upload and download datasets. For more information, see Configure ossutil.

Procedure

  1. Create a notebook.

    On the image tab, click image and select Notebook.

    image

  2. In the text field or code editor, enter content by using the SQL, Code, Markdown, or Raw syntax.

    Example: Use the Markdown syntax to enter a sentence that tells the engine the data that you want to query.

  3. Use Copilot to generate SQL statements.

  4. After you confirm that a generated SQL statement is correct, execute the SQL statement and view the generated result set.

    1. In the upper-right corner of the SQL statement section, click Execute Query.

      The SQL statement is automatically inserted into the document on the left. The generated result set is displayed below the SQL statement.

      • Display the result set in a table.

        image

      • Display the result set in a chart.

        image

  5. After the SQL statement is executed, reference the result set in other SQL cells in the form of a variable. You can customize the name of the variable.

    image

  6. Predict the data change trend.

    1. Enter the command for installing the Python packages for machine learning and click the Run icon to install the packages. The following figure shows the sample code.

      image

    2. After the packages are installed, run Python code to predict the data change trend and display the change in a visual manner.

What to do next

Publish an AI agent

FAQ

Q: Which users can view documents generated in Notebook?

A: Only users of the same tenant who are members of the workspace can view documents generated in Notebook. If a user does not belong to the tenant that owns the workspace, add the user to the tenant and add the user as a member of the workspace.