Best Practices for Importing Vector and Raster Data to GanosBase

Vector data and raster data are the primary spatial data types used extensively in GIS applications across various industries. GanosBase offers robust processing capabilities for both data types. Based on the elastic parallel processing capability of PolarDB for PostgreSQL, GanosBase provides computing efficiency that is significantly higher than that of traditional solutions. This article provides an overview of how to efficiently import vector and raster data to GanosBase. With an understanding of the solutions for importing spatial data to cloud-native databases, you can make better use of the vector and raster data processing capabilities of GanosBase.

Background

PostGIS of PostgreSQL Community Edition provides the shp2pgsql and raster2pgsql command line tools for the fast import of vector and raster data. When you use these tools, you must specify the SRID, file encoding, table name, and import method of the data to be imported. To import raster data, you must also specify parameters such as the band of the raster and tile size. However, these methods are not suitable for importing data to PolarDB databases due to the following reasons:

· The import tools provided by PostGIS require full control over the server hosting the database. Data is typically uploaded to the local disk of the server. PolarDB, however, uses a shared storage architecture that does not provide any local disk. Additionally, PolarDB databases leverage the elasticity of underlying server resources and cannot provide the information of the host.

· PostGIS Raster and GanosBase Raster use different storage structures.

GanosBase uses Object Storage Service (OSS), which is a cloud-native distributed storage service, to offer easy-to-use function-level tools for vector and raster data import.

Data Import Methods

Import Vector Data

Import Data by Using GIS Desktop Tools

PolarDB for PostgreSQL is fully compatible with PostgreSQL Community Edition. It allows you to use GIS desktop tools like QGIS to import vector data. By creating a PostgreSQL data source connection, you can connect to the PolarDB database, then use the data import capability of the desktop tool to import data.

Import Data by Using FDW

Foreign Data Wrapper (FDW) is a PostgreSQL extension that is used for accessing external data, such as data in other databases of the same cluster or in other instances. It is recommended that you use FDW to import vector data. GanosBase FDW supports unified access to various spatial data and can automatically map geometric spatial data types to the Geometry type. You can use FDW to access and query data from foreign tables and from the tables of your cluster in a unified manner.

For example, you can perform the following steps to import the poly.shp file stored in OSS into GanosBase as a table that contains data of the Geometry type:

1. Install the extensions.

CREATE EXTENSION ganos_spatialref;
CREATE EXTENSION ganos_geometry;
CREATE EXTENSION ganos_fdw;

2. Register the spatial data file poly.shp as a foreign table by using the ST_ForeignTables function.

SELECT ST_RegForeignTables('OSS://<ak_id>:<ak_secret>@<endpoint>/path/poly.shp');

3. Query the registered FDW table named poly by using the information_schema.foreign_tables view.

SELECT foreign_table_name FROM information_schema.foreign_tables 
  ORDER BY foreign_table_name ASC;

Note

In the preceding steps, external data in the foreign table is queried via data mapping. No data is written into the database. You can execute the following statement to create a table by using the data of the foreign table:

CREATE TABLE poly_db AS SELECT * FROM poly;

Import Raster Data

GanosBase provides the ST_ImportFrom and ST_CreateRast functions for importing external raster data into databases. Both functions create objects of the Raster type based on external raster data to store image metadata. In addition to the Raster object, the ST_ImportFrom function divides the image data into regular-sized chunks, each of which contains 256 x 256 pixels by default. The chunks are stored in the database based on the value of the chunkTableName parameter. The ST_CreateRast function, however, creates only the Raster object without importing pixel data to the database.

Import TIFF Data

TIFF is a common raster data format. The following example shows how to import TIFF data from OSS to GanosBase.

1. Prepare TIFF data in OSS.

2. Create a table with a Raster field to store TIFF data.

CREATE TABLE raster_table
(
  id integer,
  format text,
  rast raster
);

3. Import TIFF data by using one of the following methods:

INSERT INTO raster_table
SELECT 1, 'TIFF', ST_ImportFrom('chunk_table','OSS://<ak>:<ak_secret>@oss-cn-beijing-internal.aliyuncs.com/mybucket/data/G50E009001DOM.tif');

INSERT INTO raster_table
SELECT 2, 'TIFF', ST_CreateRast('OSS://<ak>:<ak_secret>@oss-cn-beijing-internal.aliyuncs.com/mybucket/data/G50E009001DOM.tif');

4. As a result, each method creates a Raster object. You can perform operations on these Raster objects by using raster-related UDFs provided by GanosBase.

SELECT id,ST_Georeference(rast),st_extent(rast),ST_NumBands(rast),ST_SRID(rast),st_value(rast,0,100,100),st_value(rast,1,200,200) FROM raster_table;

Sample result:

The ST_ImportFrom function creates a table (chunk_table) to store pixel data, which ensures data availability even if the original image is deleted. The ST_CreateRast function creates only a Raster object, which contains the logical mapping information to the external image. The image data is stored as a file on OSS. If the original image is deleted, you cannot view the image in the database.

Import HDF5/NetCDF data

HDF5 and NetCDF are common raster data formats. They are commonly used in fields like earth observation, scientific computing, and geoscience (such as meteorology and oceanography.) You can use the ST_ImportFrom and ST_CreateRast functions to import HDF5 and NetCDF data. The SQL statement is similar to that for importing TIFF data.

INSERT INTO raster_table
Select 3, 'NC', ST_ImportFrom('chunk_table','OSS://<ak>:<ak_secret>@oss-cn-beijing-internal.aliyuncs.com/dmybucket/data/Z_NAFP_C_BABJ_20210430000556_P_HRCLDAS_RT_CHN_0P01_HOR-WIV-2021043000.nc');

INSERT INTO raster_table
Select 4, 'NC', ST_CreateRast('OSS://<ak>:<ak_secret>@oss-cn-beijing-internal.aliyuncs.com/mybucket/data/Z_NAFP_C_BABJ_20210430000556_P_HRCLDAS_RT_CHN_0P01_HOR-WIV-2021043000.nc');

For HDF5 and NetCDF files that contain subdataset, specify the subdataset name after the file path in the import statement. See the following sample statements:

· For NetCDF files:

INSERT INTO raster_table
Select 5, 'NC', ST_ImportFrom('chunk_table','OSS://<ak>:<ak_secret>@oss-cn-beijing-internal.aliyuncs.com/dmybucket/data/image.nc:sub_name');

· For HDF5 files:

INSERT INTO raster_table
Select 5, 'HDF5', ST_ImportFrom('chunk_table','OSS://<ak>:<ak_secret>@oss-cn-beijing-internal.aliyuncs.com/dmybucket/data/image.hdf5://path/sub_name');

For HDF5 and NetCDF files that contain more than 3 dimensions, you must also specify the chunkdim parameter. Raster data in the database is loaded and stored according to the (w, h, b) dimensions, where w is the chunk width, h is the chunk height, and b is the number of bands. For NetCDF files that contain more than 3 dimensions, the product of bands of all dimensions except the x and y coordinates is stored as the number of bands for the chunk. Take the following NetCDF file as an example:

Besides lon and lat, the dimensions are time and isobaric. When you use ST_ImportFrom to import the data, specify the product of bands of all dimensions except lat and lon as the number of bands in chunkdim. In this example, the number of bands is 120. See the following SQL statement:

INSERT INTO nc_table VALUES(1, ST_ImportFrom('nc_rbt', '/Users/xiaofei/Data/raster/nc_demo.nc','{"chunkdim":"(256,256,120)"}'));

Batch Data Import

To perform batch data import from OSS, you need to use scripts. The following sample code in Python shows how to perform batch raster data import from OSS to the database:

Note:

Make sure that Python is initialized before you run the script.

## Import the required libraries
import oss2
import psycopg2
import logging

## Connect to OSS
auth = oss2.Auth('AK_ID', 'AK_Secret')
bucket = oss2.Bucket(auth, '***endpoint***', '***bucket_name***')

## Connect to the database and get the cursor
con = psycopg2.connect(database="***dataq***", user="***dde***",
                       options="-c search_path=aster,public",  # schema
                       password="******", host="*******", port="******")

cur = con.cursor()

## SQL statement for data import
insert_sql = "INSERT into {raster_table_name}(filename, band_index, raster_obj) VALUES ..."

## Get all HDF5 file names from OSS and save the file names
for obj in oss2.ObjectIterator(bucket, prefix="..."):
    ff = open("upload.txt", mode='a', encoding='utf-8')
    filename = obj.key.split('/')[-1]
    if filename.find(".hdf5") == -1:
        continue
    else:
        ff.write(filename+'\n')

## Get the file list
fileList = {}
with open('upload.txt', 'r') as f:
    fileList = f.read().split('\n')

## Traverse the file list, generate SQL and import the data
for file in fileList:
    # Edit insert_sql, add parameters...

    # Import data:
    try:
        cur.execute(insert_sql)
        con.commit()
        logging.info(filename+" finished")
    except (Exception, psycopg2.Error) as e:
        logging.info(filename+" error!")
        print(filename +" upload failed\n")

Conclusion

PolarDB offers unique data writing methods. The function-level import tools provided by GanosBase integrate SQL-level data import, querying, analysis, and service capabilities. It helps you establish standardized workflows and improve usability by enabling consistent data operations across different business stages.

Community

Best Practices for Importing Vector and Raster Data to GanosBase

Background

Data Import Methods

Import Vector Data

Import Data by Using GIS Desktop Tools

Import Data by Using FDW

Import Raster Data

Import TIFF Data

Import HDF5/NetCDF data

Batch Data Import

Conclusion

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

Best Practices

PolarDB for PostgreSQL

PolarDB for Xscale

PolarDB for MySQL