All Products
Search
Document Center

AnalyticDB:Access a MongoDB data source

Last Updated:Nov 24, 2025

This topic describes how to use AnalyticDB for MySQL Spark to access data in ApsaraDB for MongoDB.

Prerequisites

Procedure

  1. Download the required JAR packages for AnalyticDB for MySQL Spark to access ApsaraDB for MongoDB: mongo-spark-connector_2.12-10.1.1.jar, mongodb-driver-sync-4.8.2.jar, bson-4.8.2.jar, bson-record-codec-4.8.2.jar, and mongodb-driver-core-4.8.2.jar.

  2. Add the following dependency to the pom.xml file.

      <dependency>
        <groupId>org.mongodb.spark</groupId>
        <artifactId>mongo-spark-connector_2.12</artifactId>
        <version>10.1.1</version>
      </dependency>
  3. Write, compile, and package a program to access ApsaraDB for MongoDB. In this topic, the generated JAR package is named spark-mongodb.jar.

    package com.aliyun.spark
    
    import org.apache.spark.sql.SparkSession
    
    object SparkOnMongoDB {
      def main(args: Array[String]): Unit = {
        // The VPC endpoint of the ApsaraDB for MongoDB instance. You can view the endpoint on the Database Connection page in the ApsaraDB for MongoDB console.
        val connectionUri = args(0)
        // The name of the database.
        val database = args(1)
        // The name of the collection.
        val collection = args(2)
        
        val spark = SparkSession.builder()
          .appName("MongoSparkConnectorIntro")
          .config("spark.mongodb.read.connection.uri", connectionUri)
          .config("spark.mongodb.write.connection.uri", connectionUri)
          .getOrCreate()
    
        val df = spark.read.format("mongodb").option("database", database).option("collection", collection).load()
        
        df.show()
        
        spark.stop()
      }
    }
    Note

    For more information about Spark configurations for MongoDB, see Configuration Options. For more code examples, see Write to MongoDB and Read from MongoDB.

  4. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.

  5. In the navigation pane on the left, choose Job Development > Spark JAR Development.

  6. In the left-side navigation pane, choose Job Development > Spark JAR Development.

  7. In the editor, enter the following job content.

    Important
    • You can use AnalyticDB for MySQL Spark to access ApsaraDB for MongoDB over a VPC or the internet.

    • We recommend that you use a VPC for access.

    {
      "args": [
        -- The VPC endpoint of the ApsaraDB for MongoDB instance. You can view the endpoint on the Database Connection page in the ApsaraDB for MongoDB console.
    	  "mongodb://<username>:<password>@<host1>:<port1>,<host2>:<port2>,...,<hostN>:<portN>/<database_name>",
        -- The name of the database.
        "<database_name>",
        -- The name of the collection.
        "<collection_name>"
    	],
      "file": "oss://<bucket_name>/spark-mongodb.jar",
      "jars": [
        "oss://<bucket_name>/mongo-spark-connector_2.12-10.1.1.jar",
        "oss://<bucket_name>/mongodb-driver-sync-4.8.2.jar",
        "oss://<bucket_name>/bson-4.8.2.jar",
        "oss://<bucket_name>/bson-record-codec-4.8.2.jar",
        "oss://<bucket_name>/mongodb-driver-core-4.8.2.jar"
    	],
      "name": "MongoSparkConnectorIntro",
      "className": "com.aliyun.spark.SparkOnMongoDB",
      "conf": {
        "spark.driver.resourceSpec": "medium",
        "spark.executor.instances": 2,
        "spark.executor.resourceSpec": "medium",
        "spark.adb.eni.enabled": "true",
        "spark.adb.eni.vswitchId": "vsw-bp14pj8h0****",
        "spark.adb.eni.securityGroupId": "sg-bp11m93k021tp****"
      }
    }

    The following table describes the parameters.

    Parameter

    Description

    args

    The arguments that are required for the use of the JAR packages. Specify the arguments based on your business requirements. Separate multiple arguments with commas (,).

    file

    The OSS path of the sample program spark-mongodb.jar.

    jars

    The OSS path of the JAR packages on which Spark depends to access MongoDB.

    name

    The name of the Spark job.

    className

    The entry class of the Java or Scala program. The entry class is not required for a Python application.

    spark.adb.eni.enabled

    Specifies whether to enable Elastic Network Interface (ENI) access.

    You must enable ENI access when you use Data Lakehouse Edition Spark to access the MongoDB data source.

    spark.adb.eni.vswitchId

    The vSwitch ID. You can obtain the vSwitch ID from the Basic Information page of the target ApsaraDB for MongoDB instance in its console.

    spark.adb.eni.securityGroupId

    The ID of the security group that is added to the ApsaraDB for MongoDB instance. If no security group is added, see Add a security group.

    Other conf parameters

    The configuration parameters that are required for the Spark job, which are similar to those of Apache Spark. The parameters must be in the key:value format. Separate multiple parameters with commas (,). For more information, see Spark application configuration parameters.

  8. Click Execute Now.

  9. After the status of the application in the Application List changes to Completed, click Log in the Actions column to view the data in the MongoDB table.