By Haiqing
The tenant-level Information Schema of MaxCompute provides project metadata and historical data from the perspective of tenants. You can pull a type of metadata from all your projects in the same metadata center at a time for statistical analysis. We are launching a series of articles about metadata usage practices.
This article focuses on statistical analysis through the TUNNELS_HISTORY view of metadata. If you haven't used the tenant-level Information Schema, read the background information, feature introduction, fees, usage limits, and precautions written in Tenant-level Information Schema in advance to avoid unnecessary problems.
Upon reviewing the bill details in the user center, it was discovered that several projects incurred unexpected Internet download fees. In order to address this, it is necessary to examine the specific download requests for governance.
The primary reason for MaxCompute generating Internet download fees is the successful initiation of data download requests through the Tunnel Download channel over the Internet. You can download the consumption details through the user center bill details for analysis. You can find the specific project and consumption date, and then download the daily usage information based on the usage details, filter the data that is classified into DownloadEx and InstanceDownloadEx and upload them to the MaxCompute table, and query the data with the tenant-level Information Schema view SYSTEM_CATALOG.INFORMATION_SCHEMA.TUNNELS_HISTORY. By doing so, you can view more detailed download request information.
Step 1: Download usage details.
• Select MaxCompute (Pay-As-You-Go).
• Select metering specifications based on the default calculation quota of the project. For Pay-As-You-Go, select MaxCompute (Pay-As-You-Go); for Subscription, select ODPSDATAPLUS.
• The usage time should be the date on which the consumption was generated.
Step 2: Filter the metering information downloaded from the Internet and upload it to the MaxCompuet table. (The filtered data is classified into DownloadEx and InstanceDownloadEx). The statements for building the table are as follows:
CREATE TABLE IF NOT EXISTS mc_tunnel_metering
(
projectid STRING COMMENT ' Project ID'
,metering_id STRING COMMENT ' Metering information ID'
,type STRING COMMENT ' Data classification, including DownloadEx、InstanceDownloadEx'
,starttime STRING COMMENT ' Start time'
,endtime STRING COMMENT ' End time'
,download BIGINT COMMENT ' DownloadEx(Byte)'
,Region STRING COMMENT ' Region'
);
Step 3: Associate the SYSTEM_CATALOG.INFORMATION_SCHEMA.TUNNELS_HISTORY view to query more tunnel download request information.
SET odps.namespace.schema = true; -- If you have enabled the tenant-level schema syntax switch for your current tenant, you do not need to execute this flag.
WITH a AS
(
SELECT tunnel_catalog
,request_id
,object_type
,object_name
,partition_spec
,owner_id
,owner_name
,start_time
,end_time
,client_ip
,user_agent
,`columns`
FROM SYSTEM_CATALOG.INFORMATION_SCHEMA.TUNNELS_HISTORY
WHERE ds IN ('xxx','xxx')-- corresponding to consumption date.
)
,b AS
(
SELECT metering_id
,download
,region
FROM mc_tunnel_metering
WHERE download > 0
)
SELECT a.tunnel_catalog
,a.request_id
,b.metering_id
,(
b.download / 1024 / 1024 / 1024
) download_gb
,a.object_type
,a.object_name
,a.partition_spec
,a.owner_id
,a.owner_name
,a.start_time
,a.end_time
,a.client_ip
,a.user_agent
,a.`columns`
,b.region
FROM a
JOIN b
ON a.request_id = b.metering_id
;
You can use the following results:
• owner_name: the account that initiated the request. You can find the owner of the account.
• tunnel catalog/object_name/partition_spec: project/table name or selected instance ID/table partition. You can locate the specific downloaded data.
• client_ip: the IP address of the client. You can check the IP address.
Since you will not be charged when you download data over the VPC, you can use the preceding information to find the service that you want to request over the VPC. For more information about the endpoint information in different regions and different Internet connection methods, see Endpoint.
Calculate the daily inbound and outbound data volume of each project through the tunnel channel.
Use the SYSTEM_CATALOG.INFORMATION_SCHEMA.TUNNELS_HISTORY view of the tenant-level Information Schema to perform the count. The SQL statements are as follows:
SET odps.namespace.schema = true; -- If you have enabled the tenant-level schema syntax switch for your current tenant, you do not need to execute this flag.
SELECT ds
,tunnel_catalog
,operate_type
,SUM(data_size) / 1024 / 1024 / 1024 s_data_size_gb
,SUM(length) s_length
FROM SYSTEM_CATALOG.INFORMATION_SCHEMA.TUNNELS_HISTORY
WHERE ds IN ('xxx','xxx')
GROUP BY ds
,tunnel_catalog
,operate_type
;
Note that the operate_type value is of the DOWNLOADINSTANCELOG type, primarily due to the use of Tunnel Instance Download when executing various SQL, spark, pyodps, and other tasks that necessitate data echo. If further details are required, additional information such as user_agent can be viewed.
The above outlines two common scenarios. In practice, you can use the SYSTEM_CATALOG.INFORMATION_SCHEMA.CATALOGS view to access more information to address your business requirements.
Practical Use of MaxCompute Metadata: Statistical Analysis of Project Information
137 posts | 19 followers
FollowAlibaba Cloud MaxCompute - March 14, 2024
Alibaba Cloud MaxCompute - March 7, 2024
Alibaba Cloud MaxCompute - March 2, 2020
Alibaba Cloud New Products - August 20, 2020
Alibaba Cloud MaxCompute - October 31, 2022
Alibaba Cloud Community - March 4, 2022
137 posts | 19 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreTransform your business into a customer-centric brand while keeping marketing campaigns cost effective.
Learn MoreMore Posts by Alibaba Cloud MaxCompute