All Products
Search
Document Center

DataWorks:Use MaxCompute to query geolocations of IP addresses

Last Updated:Dec 13, 2024

This topic describes how to query geolocations of IP addresses by using MaxCompute. To query geolocations of IP addresses, you must download an IP address geolocation library, upload the library to MaxCompute, create a user-defined function (UDF), and then execute an SQL statement.

Prerequisites

Background information

To query the geolocation of an IP address, you can send an HTTP request to call the API provided by the IP address geolocation library of Taobao.查询接口

You cannot send HTTP requests in MaxCompute. You can query geolocations of IP addresses in MaxCompute by using one of the following methods:

  • Execute SQL statements to download data in the IP address geolocation library to your on-premises machine. Then, send HTTP requests to query the geolocation information.

    Note

    This method is inefficient. The query frequency must be less than 10 queries per second (QPS). Otherwise, query requests are rejected by the IP address geolocation library of Taobao.

  • Download the IP address geolocation library to your on-premises machine. Then, query the geolocation information in the library.

    Note

    This method is inefficient and is not suitable for scenarios in which data is analyzed by using data warehouses.

  • Maintain an IP address geolocation library and upload it to MaxCompute on a regular basis. Then, query geolocations of IP addresses in the IP address geolocation library.

    Note

    This method is efficient. You must maintain the IP address geolocation library on a regular basis.

Download an IP address geolocation library

  1. Obtain an IP address geolocation library. In this example, the sample IP address geolocation library ipdata.txt.utf8 is used. This IP address geolocation library is a library demo in the UTF-8 format.

  2. Download the ipdata.txt.utf8 file. The following figure shows the data in the file.

    image

    The following content describes the data in the sample IP address geolocation library.

    • The data is in the UTF-8 format.

    • The first two strings in a data record are the start IP address and the end IP address of an IP address range, in the decimal integer literal format. The third and fourth strings are equivalent to the first two strings, but are expressed in dotted decimal notation. The decimal integer literal format helps you check whether an IP address is within a specific IP address range.

    Note

    You can also use your own IP address geolocation library.

Upload the IP address geolocation library

  1. Execute the following statements on the MaxCompute client to create a table named ipresource. This table is used to store geolocation data of IP addresses.

    DROP TABLE IF EXISTS ipresource ;
    
    CREATE TABLE IF NOT EXISTS ipresource 
    (
        start_ip BIGINT
        ,end_ip BIGINT
        ,start_ip_arg string
        ,end_ip_arg string
        ,country STRING
        ,area STRING
        ,city STRING
        ,county STRING
        ,isp STRING
    );
  2. Run the following Tunnel command to upload data in the ipdata.txt.utf8 file to the ipresource table:

    odps@ workshop_demo>tunnel upload D:/ipdata.txt.utf8 ipresource;

    In the preceding command, D:/ipdata.txt.utf8 is the on-premises path of the ipdata.txt.utf8 file. For more information about the command, see Tunnel commands.

    You can execute the following statement to check whether the data in the file is uploaded:

    -- Query the number of data records in the ipresource table. 
    select count(*) from ipresource;
  3. Execute the following SQL statement to obtain the first 10 data records in the ipresource table:

    select * from ipresource limit 10;

    The following result is returned.

    image

Create a UDF

  1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > DataService Studio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataService Studio.

  2. Create a Python resource.

    1. Right-click the workflow and choose Create Resource > MaxCompute > Python.

    2. In the Create Resource dialog box, configure the Name parameter, select Upload to MaxCompute, and then click Create.

    3. Enter the following code in the Python resource.

      from odps.udf import annotate
      @annotate("string->bigint")
      class ipint(object):
          def evaluate(self, ip):
              try:
                  return reduce(lambda x, y: (x << 8) + y, map(int, ip.split('.')))
              except:
                  return 0
    4. Click the image.png icon to submit the resource.

  3. Create a function.

    1. Right-click the workflow that you created and choose Create Function > MaxCompute > Function.

    2. In the Create Function dialog box, configure the Name parameter and click Create.

      Note

      If multiple MaxCompute engines are bound to the workspace in DataStudio, set the Engine Instance MaxCompute parameter to the engine that you use for this function.

    3. On the configuration tab of the function, configure the parameters.

      **

      Parameter

      Description

      Function Type

      The type of the function. Valid values: Mathematical Operation Functions, Aggregate Functions, String Processing Functions, Date Functions, Window Functions, and Other Functions.

      Engine Instance MaxCompute

      A default value is displayed and cannot be changed.

      Function Name

      The name of the UDF. You can use this name to reference the UDF in SQL statements. The function name must be globally unique and cannot be changed after the function is created.

      Owner

      The value of this parameter is automatically displayed.

      Class Name

      Required. The name of the class that implements the function.

      Note

      If the resource type is Python, enter the class name in the Python resource name.Class name format. Do not include the .py extension in the resource name.

      Resources

      Required. The list of resources. You can search for existing resources in the workspace in fuzzy match mode.

      Separate multiple resources with commas (,).

      Description

      The description of the function.

      Expression Syntax

      The syntax of the UDF. Example: test.

      Parameter Description

      The description of the supported data types of input and output parameters.

      Return Value

      Optional. The value to return. Example: 1.

      Example

      Optional. The example of the function.

  4. Click the 保存 icon in the top toolbar to save the UDF.

  5. Commit the UDF.

    1. Click the 提交 icon in the top toolbar.

    2. In the Submit dialog box, enter information in the Change description field.

    3. Click Confirm.

Use the UDF that you created in an SQL statement to query the geolocation of an IP address

  1. Right-click the workflow and choose new > MaxCompute > ODPS SQL.

  2. In the Create Node dialog box, enter a node name in the Name field and click Confirm.

  3. On the configuration tab of the ODPS SQL node, enter the following statement:

    select * from ipresource
    WHERE ipint('192.0.2.0') >= start_ip
    AND ipint('192.0.2.0') <= end_ip
  4. Click **icon to run the code.

  5. You can operation Log view the results.