This topic describes how to query geolocations of IP addresses by using MaxCompute. To query geolocations of IP addresses, you must download an IP address geolocation library, upload the library to MaxCompute, create a user-defined function (UDF), and then execute an SQL statement.
Prerequisites
Create a DataWorks on the workflow. This example uses DataWorks simple mode. For more information, see create a workflow.
Background information
To query the geolocation of an IP address, you can send an HTTP request to call the API provided by the IP address geolocation library of Taobao.
You cannot send HTTP requests in MaxCompute. You can query geolocations of IP addresses in MaxCompute by using one of the following methods:
Execute SQL statements to download data in the IP address geolocation library to your on-premises machine. Then, send HTTP requests to query the geolocation information.
NoteThis method is inefficient. The query frequency must be less than 10 queries per second (QPS). Otherwise, query requests are rejected by the IP address geolocation library of Taobao.
Download the IP address geolocation library to your on-premises machine. Then, query the geolocation information in the library.
NoteThis method is inefficient and is not suitable for scenarios in which data is analyzed by using data warehouses.
Maintain an IP address geolocation library and upload it to MaxCompute on a regular basis. Then, query geolocations of IP addresses in the IP address geolocation library.
NoteThis method is efficient. You must maintain the IP address geolocation library on a regular basis.
Download an IP address geolocation library
Obtain an IP address geolocation library. In this example, the sample IP address geolocation library ipdata.txt.utf8 is used. This IP address geolocation library is a library demo in the UTF-8 format.
Download the ipdata.txt.utf8 file. The following figure shows the data in the file.
The following content describes the data in the sample IP address geolocation library.
The data is in the UTF-8 format.
The first two strings in a data record are the start IP address and the end IP address of an IP address range, in the decimal integer literal format. The third and fourth strings are equivalent to the first two strings, but are expressed in dotted decimal notation. The decimal integer literal format helps you check whether an IP address is within a specific IP address range.
NoteYou can also use your own IP address geolocation library.
Upload the IP address geolocation library
Execute the following statements on the MaxCompute client to create a table named ipresource. This table is used to store geolocation data of IP addresses.
DROP TABLE IF EXISTS ipresource ; CREATE TABLE IF NOT EXISTS ipresource ( start_ip BIGINT ,end_ip BIGINT ,start_ip_arg string ,end_ip_arg string ,country STRING ,area STRING ,city STRING ,county STRING ,isp STRING );
Run the following Tunnel command to upload data in the ipdata.txt.utf8 file to the ipresource table:
odps@ workshop_demo>tunnel upload D:/ipdata.txt.utf8 ipresource;
In the preceding command, D:/ipdata.txt.utf8 is the on-premises path of the ipdata.txt.utf8 file. For more information about the command, see Tunnel commands.
You can execute the following statement to check whether the data in the file is uploaded:
-- Query the number of data records in the ipresource table. select count(*) from ipresource;
Execute the following SQL statement to obtain the first 10 data records in the ipresource table:
select * from ipresource limit 10;
The following result is returned.
Create a UDF
Log on to the DataWorks console. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.
Create a Python resource.
Right-click the workflow and choose .
In the Create Resource dialog box, configure the Name parameter, select Upload to MaxCompute, and then click Create.
Enter the following code in the Python resource.
from odps.udf import annotate @annotate("string->bigint") class ipint(object): def evaluate(self, ip): try: return reduce(lambda x, y: (x << 8) + y, map(int, ip.split('.'))) except: return 0
Click the icon to submit the resource.
Create a function.
Right-click the workflow that you created and choose .
In the Create Function dialog box, configure the Name parameter and click Create.
NoteIf multiple MaxCompute engines are bound to the workspace in DataStudio, set the Engine Instance MaxCompute parameter to the engine that you use for this function.
On the configuration tab of the function, configure the parameters.
Parameter
Description
Function Type
The type of the function. Valid values: Mathematical Operation Functions, Aggregate Functions, String Processing Functions, Date Functions, Window Functions, and Other Functions.
Engine Instance MaxCompute
A default value is displayed and cannot be changed.
Function Name
The name of the UDF. You can use this name to reference the UDF in SQL statements. The function name must be globally unique and cannot be changed after the function is created.
Owner
The value of this parameter is automatically displayed.
Class Name
Required. The name of the class that implements the function.
NoteIf the resource type is Python, enter the class name in the Python resource name.Class name format. Do not include the .py extension in the resource name.
Resources
Required. The list of resources. You can search for existing resources in the workspace in fuzzy match mode.
Separate multiple resources with commas (,).
Description
The description of the function.
Expression Syntax
The syntax of the UDF. Example:
test
.Parameter Description
The description of the supported data types of input and output parameters.
Return Value
Optional. The value to return. Example: 1.
Example
Optional. The example of the function.
Click the icon in the top toolbar to save the UDF.
Commit the UDF.
Click the icon in the top toolbar.
In the Submit dialog box, enter information in the Change description field.
Click Confirm.
Use the UDF that you created in an SQL statement to query the geolocation of an IP address
Right-click the workflow and choose .
In the Create Node dialog box, enter a node name in the Name field and click Confirm.
On the configuration tab of the ODPS SQL node, enter the following statement:
select * from ipresource WHERE ipint('192.0.2.0') >= start_ip AND ipint('192.0.2.0') <= end_ip
Click icon to run the code.
You can operation Log view the results.