This topic describes how to use MaxCompute Studio to develop a MapReduce program. The development process includes writing, debugging, packaging, uploading, and running a MapReduce program.
Prerequisites
The following prerequisites are met:
A MaxCompute project is connected.
For more information about how to connect to a MaxCompute project, see Manage project connections.
A Java module is created.
For more information about how to create a Java module, see Create a MaxCompute Java module.
Write a MapReduce program
In the left-side navigation pane of the Project tab, choose , right-click java, and then choose .
Configure Name, select the Driver class, and then press Enter.
Name: the name of the MaxCompute Java class. If you have not created a package, specify this parameter in the packagename.classname format. The system automatically generates a package.
Select the Driver, Mapper, or Reducer class.
NoteYou can select the Driver, Mapper, or Reducer class based on your business requirements.
Driver: the driver class in a MapReduce job. This class is used to build a MapReduce job to run. You can specify the Mapper and Reducer classes to run and various task configurations in the Driver class. The Driver class can be considered as the entry class of MapReduce jobs.
Mapper: the first stage of MapReduce data processing. In this stage, each data record is processed and the related key-value pair is generated.
Reducer: processes the intermediate output that is generated by the Mapper class, generates the final output, and then saves the final output in a MaxCompute table.
After you create a MaxCompute Java class, develop a Java program in the editor.
The Java template is automatically filled with the framework code. You need only to configure the input table, output table, and the Mapper and Reducer classes.
Run a MapReduce program on your on-premises machine to debug the program
Run the MapReduce program that you wrote on your on-premises machine to debug the program, and check whether the debugging results are as expected.
Right-click the Java script that you wrote and select Run.
In the Run/Debug Configurations dialog box, select the name of the MaxCompute project in which the MapReduce program runs.
Click OK to run the UDF.
NoteThe system reads data from the specified table in warehouse as the input during the local run. You can view the log output in the console.
If you want to use table data in a MaxCompute project, you must modify the endpoint and project name in the value of the MaxCompute project parameter. If the table data in the specified MaxCompute project is not downloaded to the warehouse directory, the data is downloaded first. If the data is already downloaded, skip this step.
Perform unit testing to debug a MapReduce program
You can write a test case based on the test case for WordCount unit testing in the examples folder.
Package and upload a MapReduce program
After you debug the MapReduce program that you wrote, package the MapReduce program into a JAR file and upload the file to your MaxCompute project as a resource. For more information, see Package, upload, and register a Java program.
Run a MapReduce program
Run the MapReduce program that you developed on the MaxCompute client.
In the left-side navigation pane, click Project Explorer.
Right-click the name of your MaxCompute project and select Open in Console.
In the Console tool window, run the following command to start the MapReduce program.
For more information about the command, see Submit a MapReduce job.
jar -resources wordcount.jar -classpath D:\odps\clt\wordcount.jar com.aliyun.odps.examples.mr.WordCount wc_in wc_out;