A Deep Dive into the Core Concepts of ApsaraDB for MongoDB

Related Concept of MongoDB

In this blog, we will discuss in details the features of ApsaraDB for MongoDB (hereinafter referred to as MongoDB).

In terms of positioning, MongoDB is between Memcached and the relational database management system (RDBMS). In terms of scalability and performance, MongoDB is closer to Memcached. In terms of functionality, MongoDB is similar to RDBMS.

MongoDB Deployment Model

In the production environment, MongoDB is often deployed as a three-node replica set or a sharded cluster.

The left of the figure above shows that when MongoDB is deployed as a replica set, the application directly requests the master node in the replica set, via the driver, to complete read-write operations.

The other two slave nodes will be automatically synchronized with the master node to keep the data updated.

If the master node fails during cluster operation, the two slave nodes will elect a new master node within seconds to continue supporting application read-write operations.

The right of the figure shows that when MongoDB is deployed as a sharded cluster, applications access the routing node through the driver. It means the mongos nodes, based on the shard key values in the read-write operations, distribute the read-write operations to specific shards for execution. Then the node merges the results of the execution and returns them to the application.

How is the data in the cluster distributed? The metadata is recorded in the configuration server, which is also a highly available replica set. Each shard manages a portion of the overall data in the cluster and is also a high-availability replica set. In addition, multiple routing nodes are deployed in the production environment. By doing so, the entire sharded cluster has no single point of failure.

MongoDB Basic Concepts and Its Mappings with Relational Database Management System

As shown in the figure above, RDBMS includes database and tables, which corresponds to database and collection in MongoDB. Data database has parent-child tables, corresponding to the nested sub-document or array of MongoDB. The index is the common part of both. Besides, a piece of data in the RDBMS is called a row, while in MongoDB is called a document, and the column in the former is called the field in the latter. The join used in the RDBMS is often solved by the embedded method in MongoDB. If the linking is used, the $Lookup can also be applied to support left join. Moreover, the view in the system is related to the read-only view and on-demand materialized view, and the multi-record ACID transaction is mapping with the multi-document ACID transaction in MongoDB.

Data Hierarchy of MongoDB

MongoDB data is mainly divided into three layers. They are documents, collections, and databases. Multiple documents are stored in one collection, multiple collections are stored in one database. Each cluster may have multiple databases as well.

Example:

Database: Products
Collections: Books, Movies, Music

The combination of databases and collections forms the MongoDB namespace:

Products.Books
Products.Movies
Products.Music
The database name cannot exceed 64 bytes in length, and the namespace cannot exceed 120 bytes
Feature compatibility version (FCV) should be equal to or higher than 4.4, and the namespace length is limited to 255 bytes

Data Structure of MongoDB

MongoDB uses the JSON document structure:

The full name of JSON: JavaScript Object Notation.
JSON supports the following data format:
string: Such as “Thomas”
number: Such as 29, 3.7)
boolean: True or false
null value: Null
array: Such as [88.5, 91.3, 67.1]
object: Object

{
"firstName": "Thomas",
 "lastName": "Smith",
 "age": 29
}

Data Storage in BSON Format

MongoDB data types

The preceding figure shows a list of MongoDB data types, and almost all of the common types are supported by MongoDB.

Cluster Deployment

Install the First MongoDB System

First command: Download

curl -O https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-rhel70-4. 4.2.tgz

Second command: Extract

tar xzvf mongodb-linux-x86_64-rhel70-4.4.2.tgz

Third command: Change the directory name

mv mongodb-linux-x86_64-rhel70-4.4.2 mongodb

Fourth Command: Nothing!

Run the MongoDB

/bin/mongod --dbpath /data/db

[Code comment]

[/bin/mongod]: The bin directory of MongoDB installation
[data/db]: Location of MongoDB data file

Access the MongoDB

$ ./bin/mongo MongoDB
// Bin directory installed
MongoDB shell version: 4.4.2 
... 
Server has startup warnings: 
2020-12-15T04:23:25.268+0000 I CONTROL[initandlisten] 
2020-12-15T04:23:25.268+0000 I CONTROL [initandlisten] ** WARNIN
G: Access control is not enabled for the database.
 ...

Create replica sets

1. Create a data directory:

mkdir rs1 rs2 rs3

2. Start three MongoDB services

mongod --replSet rs --dbpath ./rs1 --port 27017 --fork --logpath ./rs 1/mongod.log 
mongod --replSet rs --dbpath ./rs2 --port 27018 --fork --logpath ./rs 2/mongod.log 
mongod --replSet rs --dbpath ./rs3 --port 27019 --fork --logpath ./rs 3/mongod.log

3. Connect to the MongoDB service:

mongo //connect to the default port 27017

4. Specify replica set configuration

rs.initiate() // Initial replication set 
rs.add ('<HOSTNAME>:27018') // Add a node configuration 
rs.add('<HOSTNAME>:27019') // Add a node configuration 
rs.status()

Create sharded cluster instances

There are five steps:

Create configuration server
Create one or more shards, each shard is a replica set
Start one or more Mongos
Access Mongos and add shards to a cluster
Select the shard key and enable shards

The entire sharded cluster has now been deployed.

Production environment deployment suggestions

In the production environment, some best practices for deployment in the production environment should be followed. For example,

Capacity planning: Computing resources, storage capacity, IOPS, Oplog, and network bandwidth
High availability: Deploy replica sets or sharded clusters
Node number: Odd number of nodes are deployed in a replica set to avoid split brain.
Apply the best practices for the production environment, such as

Using host name instead of IP
File system, XFS is recommended for Linux
Disable NUMA
Disable THP
Raising resource limits
Swappiness
Readahead
Tcp_Keepalive_Time
Clock synchronization
Security settings

Basic Operations

Insert New Document

insertOne db.products.insertOne( { item: "card", qty: 15 } );
insertMany 
db.products.insertMany( [ { _id: 10, item: "large box", qty: 20 }, { _id: 11, item: "small box", qty: 55 }, { _id: 12, item: "medium box", qty: 30 } ] ); 
Insert db.collection.insert( <document or array of documents>, { writeConcern: <document>, ordered: <boolean> } )

Delete the Document

deleteOne 
db.orders.deleteOne( { "_id" : ObjectId("563237a41a4d68582c2509da") } ); 
db.orders.deleteOne( { "expirationTime" : { $lt: ISODate("2015-11-01T12:40:15Z") } } ); 
deleteMany 
db.orders.deleteMany( { "client" : "Crude Traders Inc." } ); 
remove 
db.collection.remove( <query>, <justOne> )

Delete collections through drop

Use DB..Drop() to delete a collection
All documents in the collection are deleted.
The related index in the collection is also deleted.

db.colToBeDropped.drop()

Delete databases by DropDatabase command

To delete a database, run the DB.dropDatabase() command.
The corresponding files in the database will also be deleted, and disk space will be released.

use tempDB 
db.dropDatabase() 
show collections // No collections 
show dbs // The db is gone

Query Data Documents by Find Command

'Find' is the basic query command for MongoDB.

Find the cursor that returns data.

db.movies.find( { "year" : 1975 } ) // Single-condition query 
db.movies.find( { "year" : 1989, "title" : "Batman" } ) // Multi-condition and query
db.movies.find( { $or: [{"year" : 1989}, {"title" : "Batman"}] } ) // Multi-condition or query 
db.movies.find( { $and : [ {"title" : "Batman"}, { "category" : "action" }] } ) // and query 
db.movies.find( { "title" : /^B/} ) // Search by regular expression

SQL query conditions comparison

a = 1 -> {a: 1} 
a <> 1 -> {a: {$ne: 1}} 
a > 1 -> {a: {$gt: 1}} 
a >= 1 -> {a: {$gte: 1}} 
a < 1 -> {a: {$lt: 1}} 
a <= 1 -> {a: {$lte: 1}} 
a = 1 AND b = 1 -> {a: 1, b: 1} or {$and: [{a: 1}, {b: 1}]} 
a = 1 OR b = 1 -> {$or: [{a: 1}, {b: 1}]} 
a IS NULL -> {a: {$exists: false}} 
a IN (1, 2, 3) -> {a: {$in: [1, 2, 3]}}

Operators query

$lt: Exists and is less 
$lte: Exists and is less than or equal to
$gt: Exists and is greater 
$gte: Exists and is greater than or equal to
$ne: Does not exist or exists but is not equal to
$in: Exists and in the specified array 
$nin: Does not exist or is not in the specified array 
$or: Matches one of two or more conditions 
$and: Matches all conditions

Update Operation

Parameters required for the update operation

Parameters include

Parameters query
Parameters update

// insert data 
db.movies.insert( [ 
    { 
  "title" : "Batman", 
  "category" : [ "action", "adventure" ],
  "imdb_rating" : 7.6,
  "budget" : 35 
  }, 
  { 
  "title" : "Godzilla", 
  "category" : [ "action", "adventure", "sci-fi" ], 
  "imdb_rating" : 6.6 }, 
  { 
  "title" : "Home Alone", 
  "category" : [ "family", "comedy" ], 
  "imdb_rating" : 7.4 } 
  ] ) 
  db.movies.update( { "title" : "Batman" }, { $set : { "imdb_rating" : 7.7 } } )
//"title" : "Batman" : Query Batman
//$set : { "imdb_rating" : 7.7 }: Update IMDB rating field

Update Arrays

$Push: Adds an object to the bottom of the array
$PushAll: Add multiple objects to the bottom of the array.
$Pop: Removes an object from the bottom of an array
$Pull: If it matches the specified value or condition, the corresponding object is removed from the array.
$PullAll: Removes the corresponding object from an array if it matches the specified value or condition.
$AddToSet: Adds a value to the array if it does not exist.

Use {Upsert: True} to update or insert

Specify parameter upsert if is null: Parameter true

If there is no matching object, no update will be performed by default

db.movies.update( { "title" : "Jaws" }, { $inc: { "budget" : 5 } }, 
{ upsert: true } )
// upsert: true  : If "Jaws" is not found

// Just add a " Jaws"
“_id” : ObjectId("5847f65f83432667e51e5ea8"), 
"title" : "Jaws", 
"budget" : 5 
}

Community

A Deep Dive into the Core Concepts of ApsaraDB for MongoDB

Related Concept of MongoDB

MongoDB Deployment Model

MongoDB Basic Concepts and Its Mappings with Relational Database Management System

Data Hierarchy of MongoDB

Data Structure of MongoDB

Data Storage in BSON Format

Cluster Deployment

Install the First MongoDB System

Run the MongoDB

Access the MongoDB

Basic Operations

Insert New Document

Delete the Document

Query Data Documents by Find Command

Update Operation

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

ApsaraDB for MongoDB

PolarDB for MySQL

PolarDB for PostgreSQL

ApsaraDB for HBase