By Don Omondi, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.
The year 2000 signaled the start of a new millennium whose first ten years were marked by an increased use of technology to connect people. This is commonly referred to as the social media era. We are now into the second decade – which in itself has coined a new term - the internet of things (IoT) era. This has seen a burst in technology to connected devices. In both decades, one thing is crystal clear - relationships matter!
For us developers what this means is that how to store, query and make sense of these relationships matters too. For this purpose, traditional relational databases frequently fall short especially at scale. Because relationships are best described using graphs, it is natural that graph databases have emerged as the more capable choice.
A graph database is a database that uses graph structures to store, represent and query data. A graph structure consists of a set of vertices (also called nodes or points), together with a set of pairs of these vertices known as edges. Graph databases are based on the science of the graph theory.
Within the Graph Database space, JanusGraph is emerging as one of the favorite tools to use. JanusGraph was developed specifically to meet the need to have a large graph that can be traversed quickly and at scale. It has the proven ability to scale to billions of edges and vertexes across a multi-machine cluster and can return queries in sub-second times. JanusGraph is an open source project under The Linux Foundation, which includes participants from Expero, Google, GRAKN.AI, Hortonworks, and IBM.
Convinced? Without further ado let's dive into how to deploy JanusGraph on an Alibaba Cloud Elastic Compute Service (ECS) instance.
First, we'll require an up and running ECS instance. This article assumes one configured with CentOS version 7.4. If you haven't already, sign up on Alibaba Cloud. You can use this link to get $300 worth of free trial products.
As you set up your ECS instance, it is important to note that JanusGraph performs best with some specific configurations. We'll have a brief look into each one and advise the best settings to configure your instance to.
The first key component, is the deployment region. For best performance, select a region closest to your application server. Note the slight difference from the norm, which advises closest to your clients. This is because, JanusGraph is not usually deployed as a client facing database but mostly deployed behind your application server.
Another thing to consider is the ECS instance type. As with many databases, the more RAM the better. As such, try settle for a memory optimized instance. The first choice of 2 vCPUs and 16GiB RAM is enough for a decent size deployment.
Another critical component is storage. The default is an Ultra Cloud Disk drive, which is sufficient for most needs. But many databases, especially those that frequently persist data to disk on every transaction, will greatly benefit from faster storage so I advise you change it to SSD Cloud Disk.
Lastly, the system configuration. Alibaba Cloud's security will restrict ingress and outgoing traffic to a few ports. The default port that JanusGraph communicates on, port 8182, is not among them. So add it. You can do it by clicking on your (already deployed) instance and opening the security tab as shown below.
Click on add rules towards the bottom right. This will open a view showing the current rules set applied to your instance. Click on Quick Rule Creation, observe a similar popup to the one below.
Take special note of the rule direction at the top. We need to add 2 rules, one for network ingress and one for outgoing traffic.
With all the above taken into consideration, our new ECS instance is ready to go. Time to connect to it and prepare it for a JanusGraph deployment. If you are on windows, you can use Putty.
When I'm the only tenant on a CentOS server, one of the first things I do is to disable selinux – permanently. With it on, you may sometimes run into weird bugs here and there that are a pain to debug. I normally use this command
sudo sed -i 's/enforcing/disabled/g' /etc/selinux/config /etc/selinux/config
Next thing is to install and enable the epel repository. This enables us to download and install the latest software for our Centos version. Its good practice to always use the latest versions of software, for many reasons chiefly security, performance and feature enhancements. Let's type these commands to do so.
sudo yum -y install wget
sudo wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo rpm -Uvh epel-release-latest-7.noarch.rpm
The last steps we'll take in setting up the server is to download and install the git, zip and unzip libraries and thereafter update all software on our server. We'll use these commands.
sudo yum -y install git zip unzip
sudo yum -y update
From here on in, our server is ready to have JanusGraph deployed. If need be, you can opt to restart the server, to ensure that all settings are picked up, and that some of them persist on reboots – like the potentially annoying selinux. You can use this command.
sudo shutdown -r now
JanusGraph relies on other database technologies in order to achieve its graph structure and massive scalability. These are typically referred to as storage backends and (optional) index backends. For storage backends you can choose one from either Apache Cassandra? (or its compatible ScyllaDB), Apache HBase?, Google Cloud Bigtable or Oracle BerkeleyDB. For index backends which power JanusGraph's support for geo, numeric range, and full-text search, you can choose one from either ElasticSearch?, Apache Solr? or Apache Lucene?.
The default JanusGraph distribution comes out of the box with Cassandra configured as its storage backend and Elasticsearch as its Index backend. Therefore, lets keep things simple and use those. Both would require us to configure their respective yum repos files.
To install elasticsearch, let's first import their signing keys then use any editor to create and configure its yum repo file. We need these commands.
rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
sudo vi /etc/yum.repos.d/elasticsearch.repo
Within the repo file, we can type or paste these lines.
[elasticsearch-6.x]
name=Elasticsearch repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
We can then install Elasticsearch and its Java dependency using this yum command.
sudo yum install -y java elasticsearch
The default Elasticsearch configuration are ok for our use case. We can then proceed to configure cassandra's repo and install it.
sudo vi /etc/yum.repos.d/cassandra.repo
And type paste these commands.
[cassandra-3.x]
name=Apache Cassandra
baseurl=https://www.apache.org/dist/cassandra/redhat/311x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://www.apache.org/dist/cassandra/KEYS
We can then install Cassandra using this commands
sudo yum install -y Cassandra
Cassandra's configuration needs a little bit of tweaking to be compatible with JanusGraph. This is because from version 3 and above, Cassandra does not enable the rpc server by default, and we need this. So, let's do so.
Open its default configuration file with our editor.
sudo vi /etc/cassandra/default.conf/cassandra.yaml
Find this line.
start_rpc: false
Change it to
start_rpc: true
Save with changes and exit.
JanusGraph dependencies are now installed and configured. Let's tell our system to enable them on startup. We need these commands to do so.
sudo systemctl daemon-reload
sudo chkconfig elasticsearch on
sudo chkconfig cassandra on
We can then start them with these commands.
sudo service cassandra start
sudo service elasticsearch start
And check that both are running with these commands
sudo service elasticsearch status
sudo service cassandra status
If all is green (literally, the terminal should print status OK written in green). Then we can proceed with installing JanusGraph.
Installing and configuring JanusGraph takes very few steps. First download the latest zip file from the official GitHub repository.
wget https://github.com/JanusGraph/janusgraph/releases/download/v0.3.0/janusgraph-0.3.0-hadoop2.zip
Next, unzip the archive and cd into it.
unzip janusgraph-0.3.0-hadoop2.zip
cd janusgraph-0.3.0-hadoop2
Open a separate terminal and cd into the janusgraph directory as shown above. The reason for this is that right now we need one terminal to run JanusGraph and another to run Gremlin which we'll use to configure the running JanusGraph. We need to type these two commands in the two separate consoles for the aforementioned two reasons respectively.
bin/gremlin-server.sh
bin/gremlin.sh
Give the first command a few seconds until you see the line, 'running on port 8182' or something to this effect. Then in the second terminal, we shall insert commands to first lock down our JanusGraph instance with a username and password authentication. We need to input these lines.
:plugin use tinkerpop.credentials
graph = TinkerGraph.open()
graph.createIndex("username",Vertex.class)
credentials = credentials(graph)
credentials.createUser("myusername","mYpa$$word!")
credentials.findUser("myusername ").properties()
credentials.countUsers()
graph.io(IoCore.gryo()).writeGraph("data/credentials.kryo")
:quit
These commands basically create an in memory graph using TInkerGraph. Create credentials for it and write the data output to a file called credentials.kyro in the data folder which in itself is within the JanusGraph folder. Together? Great.
Now we need to instruct JanusGraph to use our created credentials in the above file. We'll do so by creating a credentials properties file. Using any editor, we chose vi in article, create a new file called tinkergraph-credentials.properties in the conf directory.
vi conf/tinkergraph-credentials.properties
Within this file, let's type (paste) the properties commands
gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
gremlin.tinkergraph.vertexIdManager=LONG
gremlin.tinkergraph.graphLocation=data/credentials.kryo
gremlin.tinkergraph.graphFormat=gryo
Save with changes and exit the editor. Now all that is left is to instruct JanusGraph to use the properties file we created. By default, JanusGraph comes out of the box with a ready server configuration file called gremlin-server.yaml found in the conf directory under the folder called gremlin-server. So instead of retyping everything ourselves, let's just copy this folder and tweak its settings to meet our needs.
cp conf/gremlin-server/gremlin-server.yaml conf/gremlin-server/socket-gremlin-server.yaml
vi conf/gremlin-server/socket-gremlin-server.yaml
We just need to add the following lines at the very end.
authentication: {
className: org.apache.tinkerpop.gremlin.server.auth.SimpleAuthenticator,
config: {
credentialsDb: conf/tinkergraph-credentials.properties}}
ssl: {
enabled: true}
And we are done! Please note though, before saving the file, we can optionally change three important settings at the top of the file. The host parameter, which you can change to your static ip address or put 0.0.0.0 to connect from anywhere. The port parameter, the default is 8182 and if you change this please remember to adjust you ECS instance security rules accordingly. Lastly the scriptEvaluationTimeout parameter, which determines how long your server will attempted to evaluate your gremlin commands before timing out. I like to put a 0 here and control my script from my application server.
Save our new socket-gremlin-server.yaml file with changes. And believe it or not, that's it. We can now launch a new gremlin server instance adding a path telling it which configuration file to use. The exact command is as follows.
bin/gremlin-server.sh ./conf/gremlin-server/socket-gremlin-server.yaml
However, we should create a sytemd configuration to do this automatically, since it gives us better features which allows us for example, to wait for JanusGraph dependencies Cassandra and Elasticsearch to start first.
Let's create a file called janusgraph.service in the system system folder.
sudo vi /etc/systemd/system/janusgraph.service
In it, paste this content. The user is the one who we are signed in as. You can find this out by typing the 'whoami' command. ECS allows us to be root, so that's what we'll use here.
[Unit]
Description = JanusGraph Server
After = cassandra.service elasticsearch.service
[Service]
User=root
ExecStart = /home/root/janusgraph-0.3.0-hadoop2/bin/gremlin-server.sh /home/root/janusgraph-0.3.0-hadoop2/conf/gremlin-server/socket-gremlin-server.yaml
TimeoutStartSec=60
[Install]
WantedBy = multi-user.target
Save with changes and exit. Our service is called janusgraph. Let's reload the system daemon, configure janusgraph to start on system restart, start it and see its status.
sudo systemctl daemon-reload
sudo chkconfig janusgraph on
sudo service janusgraph start
sudo service janusgraph status
If all is green, as it should be, then we have successfully created a memory optimized ECS instance and deployed a popular graph database with all the intricacies in between.
If you need to get started quickly, you can use this gremlin-ogm php library that I wrote, it comes preconfigured with a Twitter API app that can map all your Twitter followers, following, tweets and retweets onto a JanusGraph graph database. Beyond that, you can also take a look at proposed graphql to graphql standard that I came up with. Enjoy!
2,599 posts | 764 followers
FollowAlibaba Clouder - August 25, 2020
JwdShah - October 15, 2024
Nick - May 6, 2019
Alibaba Clouder - February 13, 2019
Alibaba Clouder - July 19, 2019
Nick - May 7, 2019
2,599 posts | 764 followers
FollowElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreAn encrypted and secure cloud storage service which stores, processes and accesses massive amounts of data from anywhere in the world
Learn MoreLearn More
More Posts by Alibaba Clouder