×
Community Blog Geohash vs. PostGIS

Geohash vs. PostGIS

This article discusses Geohash and PostGIS in detail and compares them to explain their limitations and advantages in handling spatial data.

By Digoal

Background

Multiple geographic representation methods are available in the industry. Generally, we use latitudes and longitudes to represent locations on Earth. PostgreSQL PostGIS describes this type of information, including the altitude. It is the most widely used technology with the highest precision and the best functionality.

However, not all databases use this technology. Many databases chose Geohash technology in the early stages of development because PostGIS is more challenging to implement.

What is Geohash?

Working Mechanism of Geohash

As defined by Oracle, Geohash is used for encoding latitude and longitude coordinates into a text string. It is a hierarchical spatial data structure that divides space into buckets of a grid shape. The size of the grid depends on the Geohash's precision.

Geohash typically uses 32 characters for encoding. Base36 geo encodings, which uses 36 characters for encoding, are also used in the industry.

After encoding, points close to each other have closer values. A prefix always contains a range represented by longer code characters. For example, wb12x comprises five characters, and it contains the grid wb12xabcd. In this case, wb12x represents a larger grid, and wb12xabcd represents a smaller grid. Do not ignore boundary issues. For example, -180 and 180 indicate the same place, but they have completely different Geohash codes. This is one of the issues with Geohash.

Steps for Converting Geohash Codes into Latitudes and Longitudes

First, create a Base32 mapping table (assume that 32 characters are used for encoding).

1

For example, if the Geohash code is ezs42, it can be translated into 13, 31, 24, 4, and 2 according to the following table.

Use 5 bits to represent each of these five numbers. Then we have 01101 11111 11000 00100 00010.

Next, starting from 0 in this bit string, connect the odd-position bits into a longitude code (0111110000000) and the even-position bits into a latitude code (101111001001).

What should we do after obtaining the latitude and longitude bit strings?

Longitude range: -180 to 180

Latitude range: -90 to 90

Divide the range in half each time. For example, divide [-180, 180] to [-180, 0] and (0, 180]. When the bit is 0, the point falls within the range on the left. When the bit is 1, the point falls within the range on the right. Then divide the next range in half. For example, divide [-180, 0] to [-180, -90] and (-90, 0]. Select a range again based on the bit value and repeat the preceding steps.

Get a final range and use the median of the range as the corresponding latitude or longitude.

2

The above figure shows the calculation process of the bit string 101111001001, with 1 falling in the ranges on the right and 0 falling within the ranges on the left.

In this table, val indicates the median of the range, and err indicates the error.

A longer Geohash string has a smaller error. Let's look at the error table below.

3

Drawbacks of Using Geohash

Geohash has the following critical drawbacks:

1) The grid-based algorithm cannot meet high-precision requirements.

2) The earth is an irregular ellipse, so the Geohash deviation changes sharply as the latitude increases. If a point is closer to the Arctic or Antarctic region, the distance calculation result is more inaccurate.

  • At the Equator (0 Degrees), the length of a longitude degree is 111.320 km, while a degree of latitude measures 110.574 km, an error of 0.67%.
  • At 30 Degrees (Mid Latitudes) the error is 110.852/96.486 = 14.89%.
  • At 60 Degrees (High Arctic) the error is 111.412/55.800 = 99.67%, reaching infinity at the poles.

3) The hash values of the longitude boundaries (-180, 180) are entirely different, but they are adjacent, which affects the adjacency calculation.

PostGIS Geometry, Geography, and Raster

PostgreSQL PostGIS is a robust data management system that has been developed over decades. It has been applied to military, scientific research, and commercial scenarios.

PostGIS adds extra types, such as geometry, geography, and raster, to the PostgreSQL database. It also adds functions, operators, and index enhancements that apply to these spatial types.

These additional functions, operators, index bindings, and types augment the power of the core PostgreSQL DBMS, making it a fast, feature-plenty, and robust spatial database management system.

Feature List

The PostGIS 2+ series provides:

  • Processing and analytic functions for both vector and raster data for splicing, dicing, morphing, reclassifying, and collecting/unioning with SQL's power
  • Raster map algebra for fine-grained raster processing
  • Spatial reprojection SQL callable functions for both vector and raster data
  • Support for importing/exporting ESRI shapefile vector data via both command-line and GUI packaged tools and support for more formats via other third-party open-source tools
  • Packaged command-line for importing raster data from many standard formats: GeoTiff, NetCDF, PNG, and JPG to name a few
  • Rendering and importing vector data support functions for standard textual formats such as KML, GML, GeoJSON, GeoHash, and WKT using SQL
  • Rendering raster data in various standard formats GeoTIFF, PNG, JPG, NetCDF, to name a few using SQL
  • Seamless raster/vector SQL callable functions for extrusion of pixel values by geometric region, running stats by region, clipping rasters by geometry, and vectorizing rasters
  • 3D object support, spatial index, and functions
  • Network Topology support
  • Packaged Tiger Loader/Geocoder/Reverse Geocoder/utilizing US Census Tiger data

Geohash vs. PostGIS

The following paragraphs are translated and cited from http://www.cnblogs.com/LBSer/p/3629149.html

Comparison in a simple and interesting scenario:

Feature Mysql spatial extension PostGIS
Spatial indexes Only MyISAM supports the R-Tree index. InnoDB does not. GiST index (a variant of the R-Tree index)
Spatial data types 2D data only 2D, 3D, and curves
Spatial operation functions Limited spatial functions Spatial operation functions defined by the OGC standard
Spatial projection Not supported Multiple common projected coordinate systems are supported
Transactions Not supported PostGIS supports a series of long-running transactions, which can effectively support complex spatial analysis.
Loading Speed MySQL > PostGIS (transactions) (Maybe the author did not optimize PostGIS.) -
Speed of creating a spatial index MySQL < PostGIS (diff split algo) -
Query efficiency MySQL and PostGIS perform well in different types of queries. -
Use of GIS Seldom used Commonly used. For example, PostgreSQL and PostGIS are used by OpenStreetMap.

Assume that you have received a task to find all points in the blue polygon. You can find only the points in the minimum bounding rectangle (the red box) by using MySQL Spatial. However, you may find points in any polygon if you use PostGIS.

4

If you need to calculate the distance between two points, calculate only the Euclidean spatial distance using MySQL Spatial, while PostGIS enables you to calculate the real spatial distance in different projected coordinate systems.

PostGIS Supports Geohash

In addition to geometry and geography, PostGIS also supports Geohash. Although Geohash is not recommended, PostGIS has a built-in conversion function for converting geometry into Geohash.

Name

ST_GeoHash — Return a GeoHash representation of the geometry.  

Synopsis

text ST_GeoHash(geometry geom, integer maxchars=full_precision_of_point); 

Description

Return a Geohash representation of the geometry. A Geohash encodes a point into a text form that is sortable and searchable based on prefix. A shorter Geohash is a less precise representation of a point. It can also be thought of as a box, which contains the actual point.

If no maxchars is specified ST_GeoHash returns a Geohash based on full precision of the input geometry type. Points return a Geohash with 20 characters of precision (about enough to hold the full double-precision of the input). Other types return a Geohash with a variable amount of precision, based on the feature's size. Larger features are represented with less precision and smaller features with more precision. The idea is that the box implied by the Geohash will always contain the input feature.

If maxchars is specified, ST_GeoHash returns a Geohash with at most that many characters, so a possibly lower precision representation of the input geometry. For non-points, the starting point of the calculation is the center of the geometry's bounding box.

Availability: 1.4.0

Note

ST_GeoHash does not work with geometries that are not in geographic (lon/lat) coordinates.

This method supports Circular Strings and Curves.

Examples

SELECT ST_GeoHash(ST_SetSRID(ST_MakePoint(-126,48),4326));  
  
     st_geohash  
----------------------  
 c0w3hf1s70w3hf1s70w3  
  
SELECT ST_GeoHash(ST_SetSRID(ST_MakePoint(-126,48),4326),5);  
  
 st_geohash  
------------  
 c0w3h  

References

0 0 0
Share on

digoal

282 posts | 25 followers

You may also like

Comments