By Alex Mungai Muchiri, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.
In the first part of the tutorial, we have seen the basics on how to use CrateDB with Python on an Alibaba Cloud Elastic Compute Service instance. In the second part of the tutorial, we will explore further how to use Python and CrateDB. This section explores more sophisticated operations including data manipulation, blobs, and the SQLAlchemy Dialect.
An Internet of Things (IoT) ecosystem is a rich network of physical devices that incorporate sensors to record data. For the system to function optimally, data from sensors should be attended quickly. CrateDB is s suitable solution as has been previously seen due to its ability to tackle data complexity and large volumes using SQL.
You can perform select operations on CrateDB using the execute()
method illustrated in the below:
>>> cursor.execute("SELECT name FROM locations WHERE name = ?", ("Ann",))
The method illustrates the interpolation of the second argument or the tuple in the query string using the ? character. Do note that it is a requirement to always use string interpolation and fetch results using any of the methods described in the below.
fetchone()
The fetchone()
call if executed after a query outputs a single row of results after the result set as so:
>>> result = cursor.fetchone()
['Ann']
To return multiple rows, the request has to be sent multiple times.
fetchmany()
A fetch_many()
call includes a specification for the number of results rows to be returned by CrateDB. The specification is usually a numeric argument as so:
>>> cursor.execute("SELECT name FROM locations order by name")
>>> result = cursor.fetchmany(2)
>>> pprint(result)
[['Tong'], ['Ann']]
Where a numerical argument is absent, the method only returns a single row as in the below:
>>> cursor.fetchmany()
[['Alissa Carl']]
fetchall()
The fetchall()
query if executed after a query returns all rows of results from the database as shown in the below:
>>> cursor.execute("SELECT name FROM locations ORDER BY name")
>>> cursor.fetchall()
[['Ann'],
['Tong'],
['Alista Carl'],
...
['New World'],
['James Ben']]
In all the cases we have experimented with above, results were served as lists and not dictionaries. Since they don't have column names for keys included, the cursor.description
method enables you to access column names. Using the DB API 2.0 you only obtain seven attributes describing a column with only the first one being supported by the library. For instance, we could have a query as indicated below:
>>> cursor.execute("SELECT * FROM locations ORDER BY name")
>>> cursor.fetchone()
[1373932800000,
None,
'I Love Alibaba Cloud',
...
None,
1]
We may get a cursor description
as indicated in the below:
>>> cursor.description
(('date', None, None, None, None, None, None),
('date.time', None, None, None, None, None, None),
('definition', None, None, None, None, None, None),
...
('event_time', None, None, None, None, None, None),
('location', None, None, None, None, None, None))
However, with list comprehension, we may be able to get something that is more manageable as shown:
>>> [column[0] for column in cursor.description]
['date', 'date.time', 'definition', ..., 'event_time', 'location']
CrateDB has an authentic integration of various technologies including SQL, NoSQL, and containerization. The company behind the database describes it as a distributed, shared-nothing, container-native architecture. The examples above have demonstrated the extensive use of the SQL method in executing queries. All writes and queries are distributed across all available nodes automatically. With that in mind, we can now move on to the next phase of your tutorial.
CrateDB has very powerful blob storage capacity, which is fully exploitable by the CrateDB Python client library. There are numerous operations that could be performed using the techniques.
To get a blob container, you first have to connect to CrateDB, whereby the instructions are readily available in the connection document. However, we shall look at it briefly while executing this example:
>>> from crate import client
>>> connection = client.connect("http://localhost:4200/")
This simple connection establishes a connection to a CrateDB node that is running on the local host with an HTTP listening port 4200. There ought to be blob tables created for you to be able to work with blobs. You then use a blob container to interact with the blob tables. Blob containers are created as so:
>>> blob_container = connection.get_blob_container('UR_blobs')
>>> blob_container
<BlobContainer 'UR_blobs'>
The example has used the connection
object to create a BlobContainer
for the UR_blobs
table. With that in place, we can now try out a few blob operations
Uploading Blobs
Blob containers are compatible with both files and file-like objects if they produce bytes when read operations are executed. To clarify, any object that supports the read()
method is classified as a file-like object. The most common such objects are the stream objects from io and tempfile modules of the Python standard library. StringIO classes are unsuitable since they produce Unicode strings, but it is possible to encode Unicode strings and feed them to BytesIO objects. See the example in the below:
>>> import io
>>> bytestream = "This is an example.".encode("utf8")
>>> file = io.BytesIO(bytestream)
After the conversion, use the put
method to upload the file to a blob table as shown:
>>> blob_container.put(file)
'6f10281ad07d4a35c6ec2f993e6376032b77131d'
The output for the method above is a SHA-1 digest, which is required for saving blobs to CrateDB.
If the SHA-1 digest is already executed, the application's performance is easily improved, but if you skip that, there will not be recomputing.
>>> file.seek(0) # seek to the beginning before attempting to re-upload
>>> digest = "6f10281ad07d4a35c6ec2f993e6376032b77131d"
>>> blob_container.put(file, digest=digest)
False
The application returned in False
in the case because specifying a digest in a put
method is a Boolean execution that only indicates if an object was successfully written or not. The operation failed since there was already an existing object with a similar name in the database.
So, we shall have to create a new object:
>>> bytestream = "This is an example.".encode("utf8")
>>> digest = hashlib.sha1(bytestream).hexdigest()
>>> another = io.BytesIO(bytestream)
Now, let us upload our object:
>>> blob_container.put(another, digest)
True
Now, the put
method returns a True
in the sense that there was a successful writing of the object to the blob container.
Retrieving blobs
Retrieving blobs necessitates that you know the blob's digest. We shall use digest
variable the to verify the existence of objects using the exists
method.
>>> blob_container.exists(digest)
True
We should anticipate a Boolean value from the method above, which happens to be True
in that case, indicating the presence of a blob in the container. We shall then proceed to use the get
method:
>>> blob_generator = blob_container.get(digest)
While blobs are read in chunks with a default 128 kb in size, it is possible to specify the chunk size we wish to obtain as in the following:
>>> res = blob_container.get(digest, 1024 * 128)
Since blob
objects are Python generators, it is possible to call for subsequent chunks using the next(blob)
method until the StopIteration
exception is encountered. Alternatively, use the idiomatic way to iterate over generators instead of making direct calls as indicated:
>>> blob_content = b''
>>> for chunk in blob_container.get(digest):
... blob_content += chunk
Deleting blobs
The delete
method allows you to delete blobs as shown:
>>> blob_container.delete(digest)
True
It is a Boolean method that returns a true or false value. In this case, the value indicates that the blob was deleted successfully. You can verify if the operation was successful using the method below:
>>> blob_container.exists(digest)
False
That is all about blobs in this tutorial. Let's now look at the SQLAlchemy Dialect in the next phase of our tutorial.
SQLAlchemy is a Python tool that is used for Object-Relational Mapping (ORM). CrateDB supports SQLAlchemy in the CrateDB Python client, whereby, a respective dialect for CrateDB is registered during installation. In this section, we shall look into how to connect databases, establishing a session, and table types in SQLAlchemy.
SQLAlchemy uses Uniform Resource Locator or database URL for its database representation. Below is its simplest representation:
crate://<HOST>
<HOST>
in this case represents a host string, which exists like so:
<HOST_ADDR>:<PORT>
Creating an Engine
The 'create_engine'
allows you to connect to CrateDB using a database URL. To begin with, you will need to import a sa
module as shown:
>>> import sqlalchemy as sa
However, you need to specify the host of your database by specifying the host like so:
>>> engine = sa.create_engine('crate://crate-1.vm.url.com:4200')
Remember to configure all your CrateDB nodes if you have multiple of them as in example below using the connect_args
argument:
>>> engine = sa.create_engine('crate:// crate-1.vm.url.com:4200', connect_args={
... 'servers': ['198.50.100.10:4200', '198.50.100.12:4200']
... })
Since the database does not validate SSL certificates, enforce the validation using the method below:
>>> engine = sa.create_engine(
... 'crate:// crate-1.vm.url.com:4200',
... connect_args={
... 'servers': ['198.50.100.10:4200', '198.50.100.12:4200'],
... 'verify_ssl_cert': True,
... 'ca_cert': '<PATH_TO_CA_CERT>',
... }
... )
<PATH_TO_CA_CERT>
should be a valid path to a CA certificate in your server.
Creating an SQLAlchemy Session
Use the method below to create a session after you have created your CrateDB engine:
>>> from sqlalchemy.orm import sessionmaker
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
We have used a declarative system to define an SQLAlchemy table:
>>> from sqlalchemy.ext import declarative
>>> from crate.client.sqlalchemy import types
>>> from uuid import uuid4
>>> def gen_key():
... return str(uuid4())
>>> Base = declarative.declarative_base(bind=engine)
>>> class Character(Base):
In this example, we have:
gen_key
function to issue UUIDsThis has covered the basics for this part 2 of the tutorial.
The aim of the CrateDB is to create an IoT database that provides for powerful analytics and simple application in IoT projects. It uses various open source technologies such as Lucene, Elasticsearch, and Netty to maintain optimal system performance. The integration of SQL and NoSQL allows the storage of nearly all types of data structures.
How to Use Node.js to Upload Files to Alibaba Cloud Object Storage
Alex - September 18, 2018
Alex - July 26, 2018
Alibaba Clouder - May 28, 2018
Alibaba Clouder - August 2, 2018
Alibaba Container Service - August 30, 2024
ApsaraDB - October 14, 2021
Provides secure and reliable communication between devices and the IoT Platform which allows you to manage a large number of devices on a single IoT Platform.
Learn MoreA cloud solution for smart technology providers to quickly build stable, cost-efficient, and reliable ubiquitous platforms
Learn MoreMigrate your Internet Data Center’s (IDC) Internet gateway to the cloud securely through Alibaba Cloud’s high-quality Internet bandwidth and premium Mainland China route.
Learn MoreAlibaba Cloud (in partnership with Whale Cloud) helps telcos build an all-in-one telecommunication and digital lifestyle platform based on DingTalk.
Learn MoreMore Posts by Alex