Create a CrateDB Backend with Python on ECS - Part 2

By Alex Mungai Muchiri, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

In the first part of the tutorial, we have seen the basics on how to use CrateDB with Python on an Alibaba Cloud Elastic Compute Service instance. In the second part of the tutorial, we will explore further how to use Python and CrateDB. This section explores more sophisticated operations including data manipulation, blobs, and the SQLAlchemy Dialect.

An Internet of Things (IoT) ecosystem is a rich network of physical devices that incorporate sensors to record data. For the system to function optimally, data from sensors should be attended quickly. CrateDB is s suitable solution as has been previously seen due to its ability to tackle data complexity and large volumes using SQL.

Selecting Data

Execute Queries

You can perform select operations on CrateDB using the execute() method illustrated in the below:

>>> cursor.execute("SELECT name FROM locations WHERE name = ?", ("Ann",))

The method illustrates the interpolation of the second argument or the tuple in the query string using the ? character. Do note that it is a requirement to always use string interpolation and fetch results using any of the methods described in the below.

Fetch Your Results

fetchone()

The fetchone() call if executed after a query outputs a single row of results after the result set as so:

>>> result = cursor.fetchone()
['Ann']

To return multiple rows, the request has to be sent multiple times.

fetchmany()

A fetch_many() call includes a specification for the number of results rows to be returned by CrateDB. The specification is usually a numeric argument as so:

>>> cursor.execute("SELECT name FROM locations order by name")
>>> result = cursor.fetchmany(2)
>>> pprint(result)
[['Tong'], ['Ann']]

Where a numerical argument is absent, the method only returns a single row as in the below:

>>> cursor.fetchmany()
[['Alissa Carl']]

fetchall()

The fetchall() query if executed after a query returns all rows of results from the database as shown in the below:

>>> cursor.execute("SELECT name FROM locations ORDER BY name")
>>> cursor.fetchall()
[['Ann'],
 ['Tong'],
 ['Alista Carl'],
...
 ['New World'],
 ['James Ben']]

Accessing Column Names

In all the cases we have experimented with above, results were served as lists and not dictionaries. Since they don't have column names for keys included, the cursor.description method enables you to access column names. Using the DB API 2.0 you only obtain seven attributes describing a column with only the first one being supported by the library. For instance, we could have a query as indicated below:

>>> cursor.execute("SELECT * FROM locations ORDER BY name")
>>> cursor.fetchone()
[1373932800000,
 None,
 'I Love Alibaba Cloud',
...
 None,
 1]

We may get a cursor description as indicated in the below:

>>> cursor.description
(('date', None, None, None, None, None, None),
 ('date.time', None, None, None, None, None, None),
 ('definition', None, None, None, None, None, None),
...
 ('event_time', None, None, None, None, None, None),
 ('location', None, None, None, None, None, None))

However, with list comprehension, we may be able to get something that is more manageable as shown:

>>> [column[0] for column in cursor.description]
['date', 'date.time', 'definition', ..., 'event_time', 'location']

About the CrateDB Design

CrateDB has an authentic integration of various technologies including SQL, NoSQL, and containerization. The company behind the database describes it as a distributed, shared-nothing, container-native architecture. The examples above have demonstrated the extensive use of the SQL method in executing queries. All writes and queries are distributed across all available nodes automatically. With that in mind, we can now move on to the next phase of your tutorial.

Blobs

CrateDB has very powerful blob storage capacity, which is fully exploitable by the CrateDB Python client library. There are numerous operations that could be performed using the techniques.

Getting the Container

To get a blob container, you first have to connect to CrateDB, whereby the instructions are readily available in the connection document. However, we shall look at it briefly while executing this example:

>>> from crate import client
>>> connection = client.connect("http://localhost:4200/")

This simple connection establishes a connection to a CrateDB node that is running on the local host with an HTTP listening port 4200. There ought to be blob tables created for you to be able to work with blobs. You then use a blob container to interact with the blob tables. Blob containers are created as so:

>>> blob_container = connection.get_blob_container('UR_blobs')
>>> blob_container
<BlobContainer 'UR_blobs'>

The example has used the connection object to create a BlobContainer for the UR_blobs table. With that in place, we can now try out a few blob operations

Blob Operations

Uploading Blobs

Blob containers are compatible with both files and file-like objects if they produce bytes when read operations are executed. To clarify, any object that supports the read() method is classified as a file-like object. The most common such objects are the stream objects from io and tempfile modules of the Python standard library. StringIO classes are unsuitable since they produce Unicode strings, but it is possible to encode Unicode strings and feed them to BytesIO objects. See the example in the below:

>>> import io
>>> bytestream = "This is an example.".encode("utf8")
>>> file = io.BytesIO(bytestream)

After the conversion, use the put method to upload the file to a blob table as shown:

>>> blob_container.put(file)
'6f10281ad07d4a35c6ec2f993e6376032b77131d'

The output for the method above is a SHA-1 digest, which is required for saving blobs to CrateDB.

If the SHA-1 digest is already executed, the application's performance is easily improved, but if you skip that, there will not be recomputing.

>>> file.seek(0) # seek to the beginning before attempting to re-upload

>>> digest = "6f10281ad07d4a35c6ec2f993e6376032b77131d"
>>> blob_container.put(file, digest=digest)
False

The application returned in False in the case because specifying a digest in a put method is a Boolean execution that only indicates if an object was successfully written or not. The operation failed since there was already an existing object with a similar name in the database.

So, we shall have to create a new object:

>>> bytestream = "This is an example.".encode("utf8")
>>> digest = hashlib.sha1(bytestream).hexdigest()
>>> another = io.BytesIO(bytestream)

Now, let us upload our object:

>>> blob_container.put(another, digest)
True

Now, the put method returns a True in the sense that there was a successful writing of the object to the blob container.

Retrieving blobs

Retrieving blobs necessitates that you know the blob's digest. We shall use digest variable the to verify the existence of objects using the exists method.

>>> blob_container.exists(digest)
True

We should anticipate a Boolean value from the method above, which happens to be True in that case, indicating the presence of a blob in the container. We shall then proceed to use the get method:

>>> blob_generator = blob_container.get(digest)

While blobs are read in chunks with a default 128 kb in size, it is possible to specify the chunk size we wish to obtain as in the following:

>>> res = blob_container.get(digest, 1024 * 128)

Since blob objects are Python generators, it is possible to call for subsequent chunks using the next(blob) method until the StopIteration exception is encountered. Alternatively, use the idiomatic way to iterate over generators instead of making direct calls as indicated:

>>> blob_content = b''
>>> for chunk in blob_container.get(digest):
...     blob_content += chunk

Deleting blobs

The delete method allows you to delete blobs as shown:

>>> blob_container.delete(digest)
True

It is a Boolean method that returns a true or false value. In this case, the value indicates that the blob was deleted successfully. You can verify if the operation was successful using the method below:

>>> blob_container.exists(digest)
False

That is all about blobs in this tutorial. Let's now look at the SQLAlchemy Dialect in the next phase of our tutorial.

Using SQLAlchemy Dialect

SQLAlchemy is a Python tool that is used for Object-Relational Mapping (ORM). CrateDB supports SQLAlchemy in the CrateDB Python client, whereby, a respective dialect for CrateDB is registered during installation. In this section, we shall look into how to connect databases, establishing a session, and table types in SQLAlchemy.

Connecting the Database

SQLAlchemy uses Uniform Resource Locator or database URL for its database representation. Below is its simplest representation:

crate://<HOST>

<HOST> in this case represents a host string, which exists like so:

<HOST_ADDR>:<PORT>

How to Get a Connection

Creating an Engine

The 'create_engine' allows you to connect to CrateDB using a database URL. To begin with, you will need to import a sa module as shown:

>>> import sqlalchemy as sa

However, you need to specify the host of your database by specifying the host like so:

>>> engine = sa.create_engine('crate://crate-1.vm.url.com:4200')

Remember to configure all your CrateDB nodes if you have multiple of them as in example below using the connect_args argument:

>>> engine = sa.create_engine('crate:// crate-1.vm.url.com:4200', connect_args={
...     'servers': ['198.50.100.10:4200', '198.50.100.12:4200']
... })

Since the database does not validate SSL certificates, enforce the validation using the method below:

>>> engine = sa.create_engine(
...     'crate:// crate-1.vm.url.com:4200',
...     connect_args={
...         'servers': ['198.50.100.10:4200', '198.50.100.12:4200'],
...         'verify_ssl_cert': True,
...         'ca_cert': '<PATH_TO_CA_CERT>',
...     }
... )

<PATH_TO_CA_CERT> should be a valid path to a CA certificate in your server.

Creating an SQLAlchemy Session

Use the method below to create a session after you have created your CrateDB engine:

>>> from sqlalchemy.orm import sessionmaker
>>> Session = sessionmaker(bind=engine)
>>> session = Session()

SQLAlchemy Tables

We have used a declarative system to define an SQLAlchemy table:

>>> from sqlalchemy.ext import declarative
>>> from crate.client.sqlalchemy import types
>>> from uuid import uuid4

>>> def gen_key():
...     return str(uuid4())

>>> Base = declarative.declarative_base(bind=engine)

>>> class Character(Base):

In this example, we have:

Defined a gen_key function to issue UUIDs
Included the table's Base class
Create a Characters class in the table
Created valued for the id column using the gen_key function
Employed for id, name, and quote columns standard SQLAlchemy
Applied Object extension in the details column
Applied ObjectArray extension in the more_details column

This has covered the basics for this part 2 of the tutorial.

Conclusion

The aim of the CrateDB is to create an IoT database that provides for powerful analytics and simple application in IoT projects. It uses various open source technologies such as Lucene, Elasticsearch, and Netty to maintain optimal system performance. The integration of SQL and NoSQL allows the storage of nearly all types of data structures.

Community

Create a CrateDB Backend with Python on ECS - Part 2

Selecting Data

Execute Queries

Fetch Your Results

Accessing Column Names

About the CrateDB Design

Blobs

Getting the Container

Blob Operations

Using SQLAlchemy Dialect

Connecting the Database

How to Get a Connection

SQLAlchemy Tables

Conclusion

Read previous post:

Read next post:

Alex

You may also like

Comments

Alex

Related Products

IoT Platform

IoT Solution

Global Internet Access Solution

Super App Solution for Telcos