This topic provides examples about how to use Alibaba Cloud Genomics Service (AGS).
Prerequisites
A Container Service for Kubernetes (ACK) cluster is created. For more information, see Create an ACK managed cluster.
You are connected to the cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Log
Run the
ags config sls
command to install and configure Log Service in AGS.Native Argo can retrieve the log data of a pod only from the node where the pod is deployed. If the pod or the node is deleted, the log data is also deleted. This brings challenges to troubleshooting. After log data is persisted to Log Service, AGS can retrieve the log data from Log Service even if the node is deleted.
Run the
ags logs
command to query the log data of a workflow.You can run the
ags logs POD/WORKFLOW
command to query the log data of a pod or a workflow.# ags logs view logs of a workflow Usage: ags logs POD/WORKFLOW [flags] Flags: -c, --container string Print the logs of this container (default "main") -f, --follow Specify if the logs should be streamed. -h, --help help for logs -l, --recent-line int how many lines to show in one call (default 100) --since string Only return logs newer than a relative duration like 5s, 2m, or 3h. Defaults to all logs. Only one of since-time / since may be used. --since-time string Only return logs after a specific date (RFC3339). Defaults to all logs. Only one of since-time / since may be used. --tail int Lines of recent log file to display. Defaults to -1 with no selector, showing all log lines otherwise 10, if a selector is provided. (default -1) --timestamps Include timestamps on each line in the log output -w, --workflow Specify that whole workflow logs should be printed
NoteIf the node where the pod is deployed exists, AGS retrieves the log data of the pod from the node. All flags are compatible with the native Argo command.
If the pod is deleted, AGS retrieves the log data from Log Service. By default, the latest 100 log entries are returned. You can use -l flag to specify the number of log entries to return.
List
You can set the --limit parameter to specify the number of workflow entries that you want to query.
# ags remote list --limit 8
+-----------------------+-------------------------------+------------+
| JOB NAME | CREATE TIME | JOB STATUS |
+-----------------------+-------------------------------+------------+
| merge-6qk46 | 2020-09-02 16:52:34 +0000 UTC | Pending |
| rna-mapping-gpu-ck4cl | 2020-09-02 14:47:57 +0000 UTC | Succeeded |
| wgs-gpu-n5f5s | 2020-09-02 13:14:14 +0000 UTC | Running |
| merge-5zjhv | 2020-09-02 12:03:11 +0000 UTC | Succeeded |
| merge-jjcw4 | 2020-09-02 10:44:51 +0000 UTC | Succeeded |
| wgs-gpu-nvxr2 | 2020-09-01 22:18:44 +0000 UTC | Succeeded |
| merge-4vg42 | 2020-09-01 20:52:13 +0000 UTC | Succeeded |
| rna-mapping-gpu-2ss6n | 2020-09-01 20:34:45 +0000 UTC | Succeeded |
Run kubectl commands
You can run the following command to query the state of a pod. AGS also allows you to run other kubectl commands.
# ags get test-v2
Name: test-v2
Namespace: default
ServiceAccount: default
Status: Running
Created: Thu Nov 22 11:06:52 +0800 (2 minutes ago)
Started: Thu Nov 22 11:06:52 +0800 (2 minutes ago)
Duration: 2 minutes 46 seconds
STEP PODNAME DURATION MESSAGE
● test-v2
└---● bcl2fq test-v2-2716811808 2m
# ags kubectl describe pod test-v2-2716811808
Name: test-v2-2716811808
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: cn-shenzhen.i-wz9gwobtqrbjgfnqxl1k/192.168.0.94
Start Time: Thu, 22 Nov 2018 11:06:52 +0800
Labels: workflows.argoproj.io/completed=false
workflows.argoproj.io/workflow=test-v2
Annotations: workflows.argoproj.io/node-name=test-v2[0].bcl2fq
workflows.argoproj.io/template={"name":"bcl2fq","inputs":{},"outputs":{},"metadata":{},"container":{"name":"main","image":"registry.cn-hangzhou.aliyuncs.com/dahu/curl-jp:1.2","command":["sh","-c"],"ar...
Status: Running
IP: 172.16. *. ***
Controlled By: Workflow/test-v2
After you run the ags kubectl command, the state of describe pod is returned. AGS supports all native kubectl commands.
Run ossutil commands
After AGS is initialized, you can run the following commands to upload and query files:
# ags oss cp test.fq.gz oss://my-test-shenzhen/fasq/
Succeed: Total num: 1, size: 690. OK num: 1(upload 1 files).
average speed 3000(byte/s)
0.210685(s) elapsed
# ags oss ls oss://my-test-shenzhen/fasq/
LastModifiedTime Size(B) StorageClass ETAG ObjectName
2020-09-02 17:20:34 +0800 CST 690 Standard 9FDB86F70C6211B2EAF95A9B06B14F7E oss://my-test-shenzhen/fasq/test.fq.gz
Object Number is: 1
0.117591(s) elapsed
You can run the ags oss command to upload and download files. AGS supports all native ossutil commands.
View the resource usage of a workflow
Create the arguments-workflow-resource.yaml file and copy the following content into the file. Run the
ags submit arguments-workflow-resource.yaml
command to specify resource requests.apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: name: test-resource spec: arguments: {} entrypoint: test-resource- templates: - inputs: {} metadata: {} name: test-resource- outputs: {} parallelism: 1 steps: - - arguments: {} name: bcl2fq template: bcl2fq - container: args: - id > /tmp/yyy;echo `date` > /tmp/aaa;ps -e -o comm,euid,fuid,ruid,suid,egid,fgid,gid,rgid,sgid,supgid > /tmp/ppp;ls -l /tmp/aaa;sleep 100;pwd command: - sh - -c image: registry.cn-hangzhou.aliyuncs.com/dahu/curl-jp:1.2 name: main resources: #don't use too much resources requests: memory: 320Mi cpu: 1000m inputs: {} metadata: {} name: bcl2fq outputs: {}
Run the
ags get test456 --show
command to query the resource usage of a workflow.The CPU usage (cores/hours) of test456 and the pod is returned in this example.
# ags get test456 --show Name: test456 Namespace: default ServiceAccount: default Status: Succeeded Created: Thu Nov 22 14:41:49 +0800 (2 minutes ago) Started: Thu Nov 22 14:41:49 +0800 (2 minutes ago) Finished: Thu Nov 22 14:43:30 +0800 (27 seconds ago) Duration: 1 minute 41 seconds Total CPU: 0.02806 (core*hour) Total Memory: 0.00877 (GB*hour) STEP PODNAME DURATION MESSAGE CPU(core*hour) MEMORY(GB*hour) ✔ test456 0 0 └---✔ bcl2fq test456-4221301428 1m 0.02806 0.00877
Configure securityContext
Create the arguments-security-context.yaml file and copy the following content to the file. Run the ags submit arguments-security-context.yaml
command to use Pod Security Policy (PSP) for permission control.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: test
spec:
arguments: {}
entrypoint: test-security-
templates:
- inputs: {}
metadata: {}
name: test-security-
outputs: {}
parallelism: 1
steps:
- - arguments: {}
name: bcl2fq
template: bcl2fq
- container:
args:
- id > /tmp/yyy;echo `date` > /tmp/aaa;ps -e -o comm,euid,fuid,ruid,suid,egid,fgid,gid,rgid,sgid,supgid
> /tmp/ppp;ls -l /tmp/aaa;sleep 100;pwd
command:
- sh
- -c
image: registry.cn-hangzhou.aliyuncs.com/dahu/curl-jp:1.2
name: main
resources: #don't use too much resources
requests:
memory: 320Mi
cpu: 1000m
inputs: {}
metadata: {}
name: bcl2fq
outputs: {}
securityContext:
runAsUser: 800
Configure automatic retry with YAML
Unexpected errors may occur for some bash commands. You can execute the command again to fix this issue. AGS provides an automatic retry mechanism based on YAML. When the system fails to run a command in a pod, AGS automatically executes the command again. You can set the maximum number of retry attempts.
Create the arguments-auto-retry.yaml file and copy the following content to the file. Run the ags submit arguments-auto-retry.yaml
command to configure the automatic retry mechanism for a workflow.
# This example demonstrates the use of retries for a single container.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: retry-container-
spec:
entrypoint: retry-container
templates:
- name: retry-container
retryStrategy:
limit: 10
container:
image: python:alpine3.6
command: ["python", -c]
# fail with a 66% probability
args: ["import random; import sys; exit_code = random.choice([0, 1, 1]); sys.exit(exit_code)"]
Retry a workflow from a failed step
A workflow consists of a number of steps. A step may fail when you run a workflow. AGS allows you to retry the workflow from a failed step.
Run the
ags get test456 --show
command to find the step when the test456 workflow fails.# ags get test456 --show Name: test456 Namespace: default ServiceAccount: default Status: Succeeded Created: Thu Nov 22 14:41:49 +0800 (2 minutes ago) Started: Thu Nov 22 14:41:49 +0800 (2 minutes ago) Finished: Thu Nov 22 14:43:30 +0800 (27 seconds ago) Duration: 1 minute 41 seconds Total CPU: 0.0572 (core*hour) Total Memory: 0.01754 (GB*hour) STEP PODNAME DURATION MESSAGE CPU(core*hour) MEMORY(GB*hour) ✔ test456 0 0 └---✔ bcl2fq test456-4221301428 1m 0.02806 0.00877 └---X bcl2fq test456-4221301238 1m 0.02806 0.00877
Run the
ags retry test456
command to retry the test456 workflow from the failed step.
Run a workflow by using ECIs
For more information about Elastic Container Instance (ECI), see Elastic Container Instance.
Install AGS before you use ECI to run a workflow. For more information, see Download and install AGS CLI.
Run the
kubectl get cm -n argo
command to query the name of the YAML file that corresponds to the workflow.# kubectl get cm -n argo NAME DATA AGE workflow-controller-configmap 1 4d
Run the
kubectl get cm -n argo workflow-controller-configmap -o yaml
command to open the YAML file workflow-controller-configmap.yaml, and copy the following content to the file. The content overwrites the existing information in the YAML file.apiVersion: v1 data: config: | containerRuntimeExecutor: k8sapi kind: ConfigMap
Run the
kubectl delete pod <podName>
command to restart the Argo controller.NotepodName is the name of the pod on which the workflow is running.
Create the arguments-workflow-eci.yaml file and copy the following content to the file. Run the
ags submit arguments-workflow-eci.yaml
command to add the nodeSelector and tolerations labels to the container that runs in the ECI.apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: generateName: hello-world- spec: entrypoint: whalesay templates: - name: whalesay container: image: docker/whalesay command: [env] #args: ["hello world"] resources: limits: memory: 32Mi cpu: 100m nodeSelector: # add nodeSelector type: virtual-kubelet tolerations: # add tolerations - key: virtual-kubelet.io/provider operator: Exists - key: alibabacloud.com effect: NoSchedule
Query the actual and peak resource usage of a workflow
The AGS workflow controller automatically queries the actual resource usage of pods per minute through metrics-server. It also calculates the total and peak resource usage of each pod.
Run the ags get steps-jr6tw --metrics
command to query the actual and peak resource usage of a workflow.
➜ ags get steps-jr6tw --metrics
Name: steps-jr6tw
Namespace: default
ServiceAccount: default
Status: Succeeded
Created: Tue Apr 16 16:52:36 +0800 (21 hours ago)
Started: Tue Apr 16 16:52:36 +0800 (21 hours ago)
Finished: Tue Apr 16 19:39:18 +0800 (18 hours ago)
Duration: 2 hours 46 minutes
Total CPU: 0.00275 (core*hour)
Total Memory: 0.04528 (GB*hour)
STEP PODNAME DURATION MESSAGE CPU(core*hour) MEMORY(GB*hour) MaxCpu(core) MaxMemory(GB)
✔ steps-jr6tw 0 0 0 0
└---✔ hello1 steps-jr6tw-2987978173 2h 0.00275 0.04528 0.000005 0.00028
Set workflow priorities
You can set the priority of a workflow to high, medium, or low to prioritize urgent tasks. A workflow with a higher priority can preempt the resources of a workflow with a lower priority.
You can use the following method to set a high priority for a pod:
Create the arguments-high-priority-taskA.yaml file and copy the following content to the file. Run the
ags submit arguments-high-priority-taskA.yaml
command to set a high priority for Task A.apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: high-priority value: 1000000 globalDefault: false description: "This priority class should be used for XYZ service pods only."
You can use the following method to set a medium priority for a pod:
Create the arguments-high-priority-taskB.yaml file and copy the following content to the file. Run the
ags submit arguments-high-priority-taskB.yaml
command to set a medium priority for Task B.apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: medium-priority value: 100 globalDefault: false description: "This priority class should be used for XYZ service pods only."
You can use the following method to set a high priority for a workflow:
Create the arguments-high-priority-Workflow.yaml file and copy the following content to the file. Run the
ags submit arguments-high-priority-Workflow.yaml
command to set a high priority for all pods of the workflow.apiVersion: argoproj.io/v1alpha1 kind: Workflow # new type of k8s spec metadata: generateName: high-proty- # name of the workflow spec spec: entrypoint: whalesay # invoke the whalesay template podPriorityClassName: high-priority # workflow level priority templates: - name: whalesay # name of the template container: image: ubuntu command: ["/bin/bash", "-c", "sleep 1000"] resources: requests: cpu: 3
The following example describes how to set priorities for pods in a workflow. Assume that the workflow has two pods and you set a high priority for one pod and a medium priority for the other pod. Then, the pod with the high priority can preempt the resources of the pod with the medium priority.
Create the arguments-high-priority-steps.yaml file and copy the following content to the file. Run the
ags submit arguments-high-priority-steps.yaml
command to set different priorities for pods.apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: generateName: steps- spec: entrypoint: hello-hello-hello templates: - name: hello-hello-hello steps: - - name: low template: low - - name: low-2 template: low - name: high template: high - name: low container: image: ubuntu command: ["/bin/bash", "-c", "sleep 30"] resources: requests: cpu: 3 - name: high priorityClassName: high-priority # step level priority container: image: ubuntu command: ["/bin/bash", "-c", "sleep 30"] resources: requests: cpu: 3
The following result indicates that the pod with a high priority preempts the resources of the other pod and the pod with a medium priority is deleted.
Name: steps-sxvrv Namespace: default ServiceAccount: default Status: Failed Message: child 'steps-sxvrv-1724235106' failed Created: Wed Apr 17 15:06:16 +0800 (1 minute ago) Started: Wed Apr 17 15:06:16 +0800 (1 minute ago) Finished: Wed Apr 17 15:07:34 +0800 (now) Duration: 1 minute 18 seconds STEP PODNAME DURATION MESSAGE ✖ steps-sxvrv child 'steps-sxvrv-1724235106' failed ├---✔ low steps-sxvrv-3117418100 33s └-·-✔ high steps-sxvrv-603461277 45s └-⚠ low-2 steps-sxvrv-1724235106 45s pod deleted
NoteA pod with a higher priority can preempt the resources of a pod with a lower priority. This stops the tasks that are running on the pod with a lower priority. Proceed with caution.
Workflow Filter
When a workflow contains a large number of pods, you can use the workflow filter in the ags get workflow command to query pods that are in a specified state.
Run the
ags get <Pod name> --status Running
command to query pods that are in the Running state.# ags get pod-limits-n262v --status Running Name: pod-limits-n262v Namespace: default ServiceAccount: default Status: Running Created: Wed Apr 17 15:59:08 +0800 (1 minute ago) Started: Wed Apr 17 15:59:08 +0800 (1 minute ago) Duration: 1 minute 17 seconds Parameters: limit: 300 STEP PODNAME DURATION MESSAGE ● pod-limits-n262v ├-● run-pod(13:13) pod-limits-n262v-3643890604 1m ├-● run-pod(14:14) pod-limits-n262v-4115394302 1m ├-● run-pod(16:16) pod-limits-n262v-3924248206 1m ├-● run-pod(17:17) pod-limits-n262v-3426515460 1m ├-● run-pod(18:18) pod-limits-n262v-824163662 1m ├-● run-pod(20:20) pod-limits-n262v-4224161940 1m ├-● run-pod(22:22) pod-limits-n262v-1343920348 1m ├-● run-pod(2:2) pod-limits-n262v-3426502220 1m ├-● run-pod(32:32) pod-limits-n262v-2723363986 1m ├-● run-pod(34:34) pod-limits-n262v-2453142434 1m ├-● run-pod(37:37) pod-limits-n262v-3225742176 1m ├-● run-pod(3:3) pod-limits-n262v-2455811176 1m ├-● run-pod(40:40) pod-limits-n262v-2302085188 1m ├-● run-pod(6:6) pod-limits-n262v-1370561340 1m
Run the
ags get <Pod name> --sum-info
command to query the current states of pods.# ags get pod-limits-n262v --sum-info --status Error Name: pod-limits-n262v Namespace: default ServiceAccount: default Status: Running Created: Wed Apr 17 15:59:08 +0800 (2 minutes ago) Started: Wed Apr 17 15:59:08 +0800 (2 minutes ago) Duration: 2 minutes 6 seconds Pending: 198 Running: 47 Succeeded: 55 Parameters: limit: 300 STEP PODNAME DURATION MESSAGE ● pod-limits-n262v
Use Autoscaler in the agility edition
Before you use Autoscaler in the agility version, make sure that the following resources are available:
A virtual private cloud (VPC).
A VSwitch.
A security group.
The internal endpoint of APIServer of the agility edition.
The specification for node scaling.
An Elastic Compute Service (ECS) instance that can access the Internet.
Perform the following operations in the AGS command-line interface:
$ags config autoscaler Enter the required information based on the tip: Please input VSwitches separated with commas (,).
vsw-hp3cq3fnv47bpz7x58wfe
Please input security group id
sg-hp30vp05x6tlx13my0qu
Please input the instanceTypes with comma separated
ecs.c5.xlarge
Please input the new ecs ssh password
xxxxxxxx
Please input k8s cluster APIServer address like(192.168.1.100)
172.24.61.156
Please input the autoscaling mode (current: release. Type enter to skip.)
Please input the min size of group (current: 0. Type enter to skip.)
Please input the max size of group (current: 1000. Type enter to skip.)
Create scaling group successfully.
Create scaling group config successfully.
Enable scaling group successfully.
Succeed
After you complete the preceding operations, log on to the Auto Scaling console to check the scaling group that you have created.
Configure and use a ConfigMap
By default, hostNetwork is used in this example.
Run the
kubectl get cm -n argo
command to query the name of the YAML file that corresponds to a workflow.# kubectl get cm -n argo
NAME DATA AGE workflow-controller-configmap 1 6d23h
Run the
kubectl edit cm workflow-controller-configmap -n argo
command to open the workflow-controller-configmap.yaml file and copy the following content to the file:data: config: | extraConfig: enableHostNetwork: true defaultDnsPolicy: Default
The following shows the details about the updated workflow-controller-configmap.yaml file:
apiVersion: v1 data: config: | extraConfig: enableHostNetwork: true defaultDnsPolicy: Default kind: ConfigMap metadata: name: workflow-controller-configmap namespace: argo
After you complete the configuration, newly deployed workflows use hostNetwork by default and the value of dnsPolicy is set to Default.
Optional:If a PSP is configured, add the following content to the YAML file of the PSP:
hostNetwork: true
NoteIf the hostNetwork parameter can be found in the YAML file, you must change the value of the parameter to true.
A YAML template contains the following content: