This topic describes how to deploy ossimport in distributed mode. You can deploy ossimport in distributed mode only in Linux.
Prerequisites
A cluster that consists of at least two machines is deployed. One machine is used as the master and others are used as workers.
A connection over SSH is established between the master and workers.
All workers use the same username and password.
NoteMake sure that a connection over SSH is established between the master and workers, or the logon credentials of the workers are configured in the sys.properties file.
Download and install ossimport
Download the installation package of ossimport.
Download ossimport-2.3.7.tar.gz to your machine.
Install ossimport.
NoteAll subsequent operations are performed on the master.
Log on to the master and run the following command to create the ossimport directory:
mkdir -p $HOME/ossimport
Go to the directory of the package and run the following command to decompress the package to the specified directory:
tar -zxvf ossimport-2.3.7.tar.gz -C $HOME/ossimport
The following code shows the directory structure after you decompress the package:
ossimport ├── bin │ ├── console.jar # The JAR package for the Console module. │ ├── master.jar # The JAR package for the Master module. │ ├── tracker.jar # The JAR package for the Tracker module. │ └── worker.jar # The JAR package for the Worker module. ├── conf │ ├── job.cfg # The job configuration file template. │ ├── sys.properties # The configuration file that contains system parameters. │ └── workers # The configuration file of workers. ├── console.sh # The command-line tool. Only Linux is supported. ├── logs # The directory that contains logs. └── README.md # The file that provides a description of ossimport. We recommend that you read this file before you use ossimport.
OSS_IMPORT_HOME: the root directory of ossimport. By default, the root directory is
$HOME/ossimport
in the decompression command. You can specify a root directory by using theexport OSS_IMPORT_HOME=<dir>
command or by modifying the$HOME/.bashrc
configuration item in the system configuration file. We recommend that you use the default root directory.OSS_IMPORT_WORK_DIR: the working directory of ossimport. You can specify a working directory by configuring the
workingDir
parameter in theconf/sys.properties
file. We recommend that you use$HOME/ossimport/workdir
as the working directory.Specify an absolute path for OSS_IMPORT_HOME or OSS_IMPORT_WORK_DIR, such as
/home/<user>/ossimport
or/home/<user>/ossimport/workdir
.
Modify configuration files
Before you use ossimport deployed in distributed mode, modify the following configuration files based on your business requirements: conf/sys.properties
, conf/job.cfg
, and conf/workers
.
conf/job.cfg
: the configuration file template used to configure jobs in distributed mode. Modify the parameters based on your business requirements.conf/sys.properties
: the configuration file that contains system parameters, such as the working directory and worker-related parameters.conf/workers
: the configuration file of workers.
Before you submit a migration job, check the parameters in the
sys.properties
andjob.cfg
files. The parameters of a migration job cannot be modified after the job is submitted.Configure and check the
workers
file before you start the service. You cannot add or remove workers after the service is started.
Run and manage migration jobs
Run migration jobs
If you use ossimport deployed in distributed mode to run migration jobs, you need to perform the following steps in most cases:
Deploy the service. Run the bash console.sh deploy command in Linux. This command deploys ossimport to all machines specified in the conf/workers configuration file.
NoteMake sure that the conf/job.cfg and conf/workers configuration files are properly configured before you deploy the service.
Clear existing jobs that have the same name. If you want to run a migration job that has the same name as an existing job, clear the existing job first. If you want to run a new migration job or retry the failed tasks of a migration job, do not run the clear command. To clear an existing job, run the
bash console.sh clean job_name
command in Linux.Submit a migration job. Make sure that the name of your migration job is unique.
A configuration file is required to submit a migration job. You can create a job configuration file based on the
conf/job.cfg
template file. To submit a migration job, run thebash console.sh submit [job_cfg_file]
command in Linux. In the command, thejob_cfg_file
parameter is optional and is set to$OSS_IMPORT_HOME/conf/job.cfg
by default, where$OSS_IMPORT_HOME
specifies the directory that contains theconsole.sh
file.Start the service. Run the
bash console.sh start
command in Linux.View the job status. Run the
bash console.sh stat
command in Linux.Retry failed tasks. Tasks may fail due to reasons such as network issues. When you run the retry command, only failed tasks are retried. To retry failed tasks, run the
bash console.sh retry [job_name]
command in Linux. In the command, thejob_name
parameter is optional and specifies the job whose failed tasks you want to retry. If you do not specify this parameter, the failed tasks of all jobs are retried.Stop the service. Run the
bash console.sh stop
command in Linux.
Note:
If the specified parameters are invalid when you run a
bash console.sh
command, the valid command format is displayed.We recommend that you specify absolute paths in configuration files and submitted migration jobs.
The
job.cfg
file contains job configuration items.ImportantYou cannot modify the configuration items of a migration job in the file after the job is submitted.
Common causes of job failures
If the source data is modified during migration, a
SIZE_NOT_MATCH
error is recorded in thelog/audit.log
file. This error indicates that the original data has been uploaded, but modifications are not uploaded to Object Storage Service (OSS).If the source data is deleted during migration, the migration job fails.
If the source files do not comply with the naming rules of OSS objects, the migration to OSS fails. The name of an object in OSS cannot start with a forward slash (/) and cannot be empty.
The source files fail to be downloaded.
If ossimport exits unexpectedly when you run a migration job, the state of the job is Abort. In this case, contact technical support.
View the job status and logs
After a migration job is submitted, the master splits the job into tasks, the workers run the tasks, and the tracker collects the task status. The following code shows the structure of the workdir directory after the job is complete:
workdir ├── bin │ ├── console.jar # The JAR package for the Console module. │ ├── master.jar # The JAR package for the Master module. │ ├── tracker.jar # The JAR package for the Tracker module. │ └── worker.jar # The JAR package for the Worker module. ├── conf │ ├── job.cfg # The job configuration file template. │ ├── sys.properties # The configuration file that contains system parameters. │ └── workers # The configuration file of workers. ├── logs │ ├── import.log # The migration logs. │ ├── master.log # The logs of the Master module. │ ├── tracker.log # The logs of the Tracker module. │ └── worker.log # The logs of the Worker module. ├── master │ ├── jobqueue # The jobs that are not split. │ └── jobs # The information about jobs. │ └── xxtooss # The name of the job. │ ├── checkpoints # The checkpoints generated when the master splits the job into tasks. │ │ └── 0 │ │ └── ED09636A6EA24A292460866AFDD7A89A.cpt │ ├── dispatched # The tasks that are dispatched to workers but not complete. │ │ └── 192.168.1.6 │ ├── failed_tasks # The failed tasks. │ │ └── A41506C07BF1DF2A3EDB4CE31756B93F_1499348973217@192.168.1.6 │ │ ├── audit.log # The run logs of the task. You can view the logs to identify error causes. │ │ ├── DONE # The mark file that indicates the completion of the task. If the task fails, the content is empty. │ │ ├── error.list # The errors of the task. You can view the errors in the file. │ │ ├── STATUS # The mark file that indicates the task status. The content of this file is Failed or Completed. │ │ └── TASK # The description of the task. │ ├── pending_tasks # The tasks that are not dispatched. │ └── succeed_tasks # The tasks that are successfully run. │ └── A41506C07BF1DF2A3EDB4CE31756B93F_1499668462358@192.168.1.6 │ ├── audit.log # The run logs of the task. You can view the logs to identify error causes. │ ├── DONE # The mark file that indicates the completion of the task. │ ├── error.list # The errors of the task. If the task is successful, the content is empty. │ ├── STATUS # The mark file that indicates the task status. The content of this file is Failed or Completed. │ └── TASK # The description of the task. └── worker # The tasks that are being run by the worker. After tasks are run, they are managed by the master. └── jobs ├── local_test2 │ └── tasks └── local_test_4 └── tasks
ImportantTo view the information about jobs, check the
logs/import.log
file.To troubleshoot a failed task, check the
master/jobs/${JobName}/failed_tasks/${TaskName}/audit.log
file.To view the errors of a failed task, check the
master/jobs/${JobName}/failed_tasks/${TaskName}/error.list
file.The preceding logs are for reference only. Do not deploy your service and application based on these logs.
Verify migration results
ossimport does not verify data after migration and therefore does not ensure data consistency or integrity. After a migration job is complete, remember to verify data consistency between the migration source and destination.
If you delete the source data without verifying data consistency between the migration source and destination, you are responsible for any losses and consequences that arise.
Common errors and troubleshooting
For more information about common errors and troubleshooting, see FAQ.