Category | Parameter | Description | Example |
N/A | conf.version | The version of the configuration file. Do not change the value. | conf.version = 4
|
Global configuration options | id | The ID of the synchronization task. This value is customizable. The global configuration includes the log file name, the name of the database that stores the checkpoint information, and the name of the destination database. | id = mongoshake
|
master_quorum | Specifies whether the MongoShake node is the active node in high availability scenarios. If you use the active MongoShake node and standby MongoShake node to synchronize data from the same database, set this parameter to true for the active MongoShake node. | master_quorum = false
|
full_sync.http_port | The HTTP port used to view the status of full data synchronization in MongoShake over the Internet. | full_sync.http_port = 9101
|
incr_sync.http_port | The HTTP port used to view the status of incremental data synchronization in MongoShake over the Internet. | incr_sync.http_port = 9100
|
system_profile_port | The profiling port used to view internal stack information. | system_profile_port = 9200
|
log.level | The level of the logs to be generated. Valid values: error: generates logs that contain error messages. warning: generates logs that contain warnings. info: generates logs that indicate system status. debug: generates logs that contain debugging information.
Default value: info. | log.level = info
|
log.dir | The directory where the log file and PID file are stored. If you do not configure this parameter, the log file and PID file are stored in the logs directory in the working directory. Note This parameter must be set to an absolute path. | log.dir = ./logs/
|
log.file | The name of the log file. This value is customizable. Note Default value: collector.log. | log.file = collector.log
|
log.flush | Specifies whether to display every log entry on the screen. Valid values: true: Every log entry is displayed on the screen. This ensures that no log entry is missing on the screen but compromises performance. false: Not every log entry is displayed on the screen. This ensures performance but some log entries may be missing on the screen.
| log.flush = false
|
sync_mode | The data synchronization method. Valid values: all: performs both full data synchronization and incremental data synchronization. full: performs only full data synchronization. incr: performs only incremental data synchronization.
| sync_mode = all
|
mongo_urls | The connection string URI of the source ApsaraDB for MongoDB instance. The database account is test and the database is admin. | mongo_urls = mongodb://test:****@dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717,dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717
|
mongo_cs_url | The endpoint of a ConfigServer node. If the source ApsaraDB for MongoDB instance is a sharded cluster instance, you must configure this parameter. For more information about how to apply for an endpoint for a ConfigServer node, see Apply for an endpoint for a shard or ConfigServer node in a sharded cluster instance. The database account is test and the database is admin. | mongo_cs_url = mongodb://test:****@dds-bp19f409d7512****-csxxx.mongodb.rds.aliyuncs.com:3717,dds-bp19f409d7512****-csxxx.mongodb.rds.aliyuncs.com:3717/admin
|
mongo_s_url | The endpoint of a mongos node. If the source ApsaraDB for MongoDB instance is a sharded cluster instance, you must configure this parameter. You must specify the endpoint of at least one mongos node. Separate the endpoints of multiple mongos nodes with commas (,). For more information about how to apply for an endpoint for a mongos node, see Apply for an endpoint for a shard or ConfigServer node in a sharded cluster instance. The database account is test and the database is admin. | mongos_s_url = mongodb://test:****@s-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717,s-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717/admin
|
tunnel | The type of the tunnel used for synchronization. Valid values: direct: directly synchronizes data to the destination ApsaraDB for MongoDB instance. rpc: synchronizes data by using NET/RPC. tcp: synchronizes data by using TCP. file: synchronizes data by transferring files. kafka: synchronizes data by using Kafka. mock: only used for testing without writing data to the tunnel.
| tunnel = direct
|
tunnel.address | The address used to connect to the destination ApsaraDB for MongoDB instance through the tunnel. If the tunnel parameter is set to direct , set the tunnel.address parameter to the connection string URI of the destination ApsaraDB for MongoDB instance. If the tunnel parameter is set to rpc , set the tunnel.address parameter to the receiver socket address used in the RPC connection to the destination ApsaraDB for MongoDB instance. If the tunnel parameter is set to tcp , set the tunnel.address parameter to the receiver socket address used in the TCP connection to the destination ApsaraDB for MongoDB instance. If the tunnel parameter is set to file , set the tunnel.address parameter to the file path in the destination ApsaraDB for MongoDB instance. If the tunnel parameter is set to kafka , set the tunnel.address parameter to the broker server addresses of Kafka. Example: topic@brokers1,brokers2 . If the tunnel parameter is set to mock , you do not need to configure the tunnel.address parameter.
The database account is test and the database is admin. | tunnel.address = mongodb://test:****@dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717,dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717
|
tunnel.message | The type of the data to be written to the tunnel. This parameter is valid only when the tunnel parameter is set to kafka or file . Valid values: raw: writes data in the original format. The data is aggregated in batches to be written or read at a time. json: writes data to Kafka in the JSON format so that the data can be directly read. bson: writes data to Kafka in the Binary JSON (BSON ) format.
| tunnel.message = raw
|
mongo_connect_mode | The type of the node from which MongoShake pulls data. This parameter is valid only when the tunnel parameter is set to direct . Valid values: primary: pulls data from the primary node. secondaryPreferred: pulls data from a secondary node. standalone: pulls data from the single node that is specified.
Note Default value: secondaryPreferred. | mongo_connect_mode = secondaryPreferred
|
filter.namespace.black | The namespace blacklist for data synchronization. The specified namespaces are not synchronized to the destination database. Separate multiple namespaces with semicolons (;). Note A namespace is the standard name of a collection or index in ApsaraDB for MongoDB. It consists of a database name and a collection or index name. Example: mongodbtest.customer . | filter.namespace.black = mongodbtest.customer;testdata.test123
|
filter.namespace.white | The whitelist for data synchronization. Only the specified namespaces are synchronized to the destination database. Separate multiple namespaces with semicolons (;). | filter.namespace.white = mongodbtest.customer;test123
|
filter.pass.special.db | The special database from which you want to synchronize data to the destination database. You can specify multiple special databases. By default, the data in special databases such as admin, local, mongoshake, config, and system.views is not synchronized. You can configure this parameter to synchronize data from special databases. Separate multiple database names with semicolons (;). | filter.pass.special.db = admin;mongoshake
|
filter.ddl_enable | Specifies whether to synchronize DDL operations. Valid values: Note If the source ApsaraDB for MongoDB instance is a sharded cluster instance, you cannot set this parameter to true. | filter.ddl_enable = false
|
checkpoint.storage.url | The storage location of checkpoints, which are used for resumable transmission. If you do not configure this parameter, MongoShake writes checkpoints to the following databases based on the type of the source ApsaraDB for MongoDB instance: The database account is test and the database is admin. | checkpoint.storage.url = mongodb://test:****@dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717,dds-bp19f409d7512****.mongodb.rds.aliyuncs.com:3717
|
checkpoint.storage.db | The name of the database that stores checkpoints. Note Default value: mongoshake. | checkpoint.storage.db = mongoshake
|
checkpoint.storage.collection | The name of the collection that stores checkpoints. If you use the active MongoShake node and standby MongoShake node to synchronize data from the same database, you can change this collection name to avoid the conflict caused by duplicate collection names. Note Default value: ckpt_default. | checkpoint.storage.collection = ckpt_default
|
checkpoint.start_position | The start position for resumable transmission. If a checkpoint exists, this parameter is invalid. Specify a value for this parameter in the following format: YYYY-MM-DDTHH:MM:SSZ . Note Default value: 1970-01-01T00:00:00Z. | checkpoint.start_position = 1970-01-01T00:00:00Z
|
transform.namespace | The rule for renaming the source database or collection in the destination database. For example, you change the database name and collection name from Database A.Collection B to Database C.Collection D in the destination database. | transform.namespace = fromA.fromB:toC.toD
|
Full data synchronization options | full_sync.reader.collection_parallel | The maximum number of collections that can be concurrently pulled by MongoShake at a time. | full_sync.reader.collection_parallel = 6
|
full_sync.reader.write_document_parallel | The number of concurrent threads used by MongoShake to write a collection. | full_sync.reader.write_document_parallel = 8
|
full_sync.reader.document_batch_size | The number of documents to be written to the destination ApsaraDB for MongoDB instance at a time. For example, the value 128 indicates that 128 documents are written to the destination ApsaraDB for MongoDB instance at a time. | full_sync.reader.document_batch_size = 128
|
full_sync.collection_exist_drop | Specifies whether to delete the collections in the destination database that have the same names as the source collections before synchronization. Valid values: true: deletes the collections in the destination database that have the same names as the source collections before synchronization. Warning This option deletes collections in the destination database. Therefore, back up collection data in the destination database in advance. false: returns an error message and exits if collections in the destination database have the same name as collections in the source database.
| full_sync.collection_exist_drop = true
|
full_sync.create_index | Specifies whether to create indexes after the synchronization is complete. Valid values: foreground: Indexes are created in the foreground. background: Indexes are created in the background. none: No indexes are created.
| full_sync.create_index = none
|
full_sync.executor.insert_on_dup_update | Specifies whether to change the INSERT statement to the UPDATE statement if a document in the destination database has the same _id value as a document in the source database. Valid values: | full_sync.executor.insert_on_dup_update = false
|
full_sync.executor.filter.orphan_document | Specifies whether to filter out orphaned documents if the source ApsaraDB for MongoDB instance is a sharded cluster instance. Valid values: | full_sync.executor.filter.orphan_document = false
|
full_sync.executor.majority_enable | Specifies whether to enable the majority write feature in the destination ApsaraDB for MongoDB instance. Valid values: | full_sync.executor.majority_enable = false
|
Incremental data synchronization options | incr_sync.mongo_fetch_method | The method used to pull incremental data. Valid values: Default value: oplog | incr_sync.mongo_fetch_method = oplog
|
incr_sync.oplog.gids | The global ID used to implement two-way replication for ApsaraDB for MongoDB instances. | incr_sync.oplog.gids = xxxxxxxxxxxx
|
incr_sync.shard_key | The method used to distribute concurrent requests to internal worker threads. Do not modify this parameter value. | incr_sync.shard_key = collection
|
incr_sync.worker | The number of concurrent threads used to transmit oplogs. If your instance provides sufficient performance, you can increase the number of concurrent threads. Note If the source ApsaraDB for MongoDB instance is a sharded cluster instance, the number of concurrent threads must be equal to the number of shards. | incr_sync.worker = 8
|
incr_sync.worker.oplog_compressor | Specifies whether to decompress data to reduce network bandwidth usage. Valid values: none: No data is compressed. gzip: Data is compressed in the GZIP format. zlib: Data is compressed in the ZLIB format. deflate: Data is compressed in the DEFLATE format.
Note This parameter is valid only when the tunnel parameter is not set to direct . If the tunnel parameter is set to direct , set the incr_sync.worker.oplog_compressor parameter to none . | incr_sync.worker.oplog_compressor = none
|
incr_sync.target_delay | The time delayed for synchronizing data between the source and destination ApsaraDB for MongoDB instances. By default, changes in the source database are synchronized to the destination database in real time. To avoid invalid operations, you can set this parameter to delay the synchronization. For example, if you set the incr_sync.target_delay parameter to 1800, the synchronization is delayed for 30 minutes. Unit: seconds. Note The value 0 indicates that data is synchronized in real time. | incr_sync.target_delay = 1800
|
incr_sync.worker.batch_queue_size | The parameters for configuring internal queues in MongoShake. Do not modify these parameters unless otherwise required. | incr_sync.worker.batch_queue_size = 64
|
incr_sync.adaptive.batching_max_size | incr_sync.adaptive.batching_max_size = 1024
|
incr_sync.fetcher.buffer_capacity | incr_sync.fetcher.buffer_capacity = 256
|
Direct synchronization options (valid only when the tunnel parameter is set to direct ) | incr_sync.executor.upsert | Specifies whether to change the UPDATE statement to the INSERT statement if a document in the destination database has the same _id value or unique index as a document in the source database. Valid values: | incr_sync.executor.upsert = false
|
incr_sync.executor.insert_on_dup_update | Specifies whether to change the INSERT statement to the UPDATE statement if a document in the destination database does not have the same _id value or unique index as a document in the source database. Valid values: | incr_sync.executor.insert_on_dup_update = false
|
incr_sync.conflict_write_to | Specifies whether to record conflicting documents if write conflicts occur during the synchronization. Valid values: none: Conflict documents are not recorded. db: Conflict logs are written to the mongoshake_conflict database. sdk: Conflict logs are written to an SDK.
| incr_sync.conflict_write_to = none
|
incr_sync.executor.majority_enable | Specifies whether to enable the majority write feature in the destination ApsaraDB for MongoDB instance. Valid values: Note The majority write feature may compromise performance. | incr_sync.executor.majority_enable = false
|