For batch data processing, machine learning pipelines, infrastructure automation, and CI/CD, traditional batch task orchestration and stream orchestration cannot meet complex requirements or support automation. Alibaba Cloud provides the cloud-native Argo Workflows component to help you reduce the complexity of orchestrating batch tasks.
Open-source Argo Workflows
Argo Workflows is a powerful cloud-native workflow engine designed to define, manage, and schedule complex workflows in Kubernetes. A workflow can include multiple tasks with dependencies between them. This flexibility simplifies task configuration.
Scenarios
Argo Workflows supports various scenarios, and is widely used in industries such as autonomous driving, scientific computing, financial quantitative analysis, and digital media.
Batch data processing: large-scale high-precision map processing, financial quantitative backtesting simulations, parallel audio and video processing, animation rendering.
Scientific computing: complex scientific computing simulations, pharmaceutical research and training, gene sequencing, mutation alignment detection, energy exploration.
Simulation and modeling: autonomous driving algorithm simulations, molecular dynamics simulations, astronomical data simulations, financial modeling.
Machine learning pipelines: machine learning data preprocessing, distributed training, large model parameter tuning, model evaluation and deployment.
Infrastructure automation: automated management of cloud resources, resource backup and recovery, node pool migration, cluster migration and upgrades.
CI/CD: parallel CI pipelines, multi-stage build and testing, cross-cloud application deployment, integration of approval workflows.
Advantages
Cloud native: Specifically designed for Kubernetes, each task is a pod that fully uses the lightweight and flexible nature of containers.
Lightweight and scalability: Compared to traditional VMs, Argo Workflows is lightweight and imposes no additional overhead or limitations. With the robust scheduling capabilities provided by Kubernetes, thousands of tasks can be launched in parallel, thus improving processing efficiency.
Flexible orchestration capabilities: The flexible combination of directed acyclic graphs (DAGs) and steps supports the customization of workflows with a wide range of complexity. With powerful retry and caching mechanisms, the success rate of workflow executions is improved.
Rich ecosystem: Orchestration of various types of tasks, such as Spark, Ray, and TensorFlow jobs, is supported. Combined with event-driven capabilities, it can build fully automated task processing platforms.
Use Argo Workflows
ACS Argo Workflows is compatible with open source Argo Workflows and further enhanced. You can seamlessly migrate current workflows to ACS Argo Workflows. ACS Argo Workflows provides the following benefits:
High elasticity, auto scaling, and compute cost optimization.
Workflow clusters support high scheduling reliability and multi-zone load balancing.
Workflow clusters use control planes whose performance, efficiency, stability, and observability are optimized.
Workflow clusters support enhanced OSS management capabilities, such as large object uploading, artifacts garbage collection (GC), and data streaming.
With the help from container service experts, you can optimize workflows to improve efficiency and reduce costs.
ACK Argo Workflows can meet various user requirements in the following ways:
Serverless Argo Workflows: To create O&M-free, large-scale, and high-performance workflows, you need to create separate workflow clusters. For more information, see Serverless Argo Workflows.
Argo Workflows component on ACS: If you use ACS clusters and want to use existing cluster resources, you can use the Argo Workflows component to orchestrate workflows. This topic describes how to use the Argo Workflows component in ACS clusters.
After the component is installed, you can batch orchestrate tasks. You can use the Alibaba Cloud Argo CLI or Argo console to submit and manage workflows.
The following figure shows the responsibilities of different roles.
Step | Description |
1. Prepare |
|
2. Set up the environment |
For more information, see Enable batch task orchestration. |
3. Manage workflows | (Data engineers) After the concurrent tasks are orchestrated, you can use the Argo CLI or Argo console to submit and manage the tasks.
|
(Custer administrators)
|
Billing
Batch task orchestration is free of charge. In addition to the fees charged for ACS computing power and other cloud services, CLB also charges a pay-as-you-go fee when you use batch task orchestration. For more information, see CLB billing.
Contact us
If you have any questions or suggestions, join DingTalk group 35688562.