全部產品
Search
文件中心

Platform For AI:查詢命令

更新時間:Jul 13, 2024

您可以通過用戶端工具查看任務日誌、工作清單和任務詳情。本文介紹查詢相關的命令詳情,包括調用格式、參數解釋及使用樣本。

查看任務日誌(logs)

  • 功能

    查看一個訓練任務的日誌詳情。

  • 格式

    ./dlc logs <yourJobId> <yourPodId> [--max_events_num <yourMaxNum>] [--start_time <yourStartTime>] [--end_time <yourEndTime>]
  • 參數

    參數

    是否必選

    描述

    類型

    <yourJobId>

    待查看訓練任務的ID。

    STRING

    <yourPodId>

    待查看日誌的執行個體(Pod)ID。在分布式任務情境下,存在多個執行個體(Pod)。

    STRING

    max_events_num <yourMaxNum>

    返回的日誌最大行數,預設值為2000。

    INT

    start_time <yourStartTime>

    日誌查詢的起始時間,預設值為7天前。例如,start_time 2020-11-08T16:00:00Z

    STRING

    end_time <yourEndTime>

    日誌查詢的截止時間,預設值為目前時間。例如,end_time 2020-11-08T17:00:00Z

    STRING

  • 樣本

    針對分布式訓練任務的0號Worker節點,擷取十行日誌。

    ./dlc logs dlcdys3r9jlu**** dlcdys3r********-worker-0 --max_events_num 10

    系統返回如下類似結果。

    WARN: ./requirements.txt not found, skip installing requirements.
    ================================================
    |  PAI Tensorflow powered by Aliyun PAI Team.  |
    ================================================
    Network is under initialization...
    Network successfully initialized.
    [2021-04-16 12:27:56.368026] [INFO] [7#7] [tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
    [2021-04-16 12:27:56.375586] [INFO] [7#7] [tensorflow/core/distributed_runtime/master.cc:80] ====================CPU Architecture=====================
    [2021-04-16 12:27:56.375600] [INFO] [7#7] [tensorflow/core/distributed_runtime/master.cc:84] Disable AVX512.
    [2021-04-16 12:27:56.375605] [INFO] [7#7] [tensorflow/core/distributed_runtime/master.cc:87] CPU Vendor ID: GenuineIntel

查看工作清單與狀態

  • 功能

    擷取訓練任務的資訊。如果不指定JobID,則會將所有的任務資訊列出;如果指定了JobID,則只會展示對應的任務資訊。

  • 格式

    ./dlc get job [JOB_ID] [--workspace_id <yourWorkspaceId>] [--display_name <yourJobName>] [--job_type <yourJobType>] [--status <yourJobStatus>] [--start_time <yourStartTime>] [--end_time <yourEndTime>] [--page_num <yourPageNum>] [--page_size <yourPageSize>] [--max_events_num <yourMaxNum>] [--events] [--events_only]
  • 參數

    參數

    是否必選

    描述

    類型

    JOB_ID

    待查看訓練任務的ID。

    STRING

    workspace_id <yourWorkspaceId>

    工作空間ID。

    STRING

    display_name <yourJobName>

    任務名稱,支援模糊查詢,不支援萬用字元查詢,大小寫不敏感。

    STRING

    job_type <yourJobType>

    任務類型,支援查詢所有任務類型。預設為空白,代表所有類型。

    STRING

    status <yourJobStatus>

    任務狀態。預設為空白,代表任務所有狀態。

    STRING

    start_time <yourStartTime>

    查詢區間的起始時間,使用任務的建立時間來過濾。例如:start_time 2022-08-04T02:09:32Z

    STRING

    end_time <yourEndTime>

    查詢區間的截止時間,使用任務的建立時間來過濾。例如:end_time 2022-08-04T02:09:32Z

    STRING

    page_num <yourPageNum>

    分頁查詢,指定當前查詢需要返回的頁碼,編號從1開始,預設為1。

    INT

    page_size <yourPageSize>

    分頁查詢中,指定當前查詢每頁返回的數量,預設為10。

    INT

    max_events_num <yourMaxNum>

    返回的系統事件的最大行數,預設為2000。

    INT

    events

    是否查詢任務的系統事件,僅查詢單個任務時才會生效。預設為false。

    BOOL

    events_only

    是否只查詢任務的系統事件,僅查詢單個任務時才會生效。預設為false。

    BOOL

  • 樣本

    • 按照任務名稱模糊比對查詢所有的訓練任務。

      ./dlc get job --display_name epl

      系統返回如下類似結果。

      +--------------------+------------------+-------------+------------------+------------+----------------+---------+----------+-----------+------------------+----------------------+----------------------+----------------------+----------------------+-------------+------------+----------------------+-------------------+
      |        Name        |      JobId       | WorkspaceId |  WorkspaceName   | ResourceId |  ResourceName  | JobType | Priority | JobStatus |      UserId      |      CreateTime      |    SubmittedTime     |     RunningTime      |    SuccessedTime     | StoppedTime | FailedTime |      FinishTime      | Duration(seconds) |
      +--------------------+------------------+-------------+------------------+------------+----------------+---------+----------+-----------+------------------+----------------------+----------------------+----------------------+----------------------+-------------+------------+----------------------+-------------------+
      | test_epl_test-**** | dlc02xipvt5z**** | 23****      | doc_test_**** |            | public-cluster | TFJob   | 1        | Succeeded | 144963168668**** | 2022-08-01T06:41:05Z | 2022-08-01T06:45:08Z | 2022-08-01T06:48:57Z | 2022-08-01T06:53:21Z |             |            | 2022-08-01T06:53:21Z | 736               |
      | test_epl_****      | dlc1iyv3szl2**** | 23****      | doc_test_**** |            | public-cluster | TFJob   | 1        | Succeeded | 144963168668**** | 2022-08-01T03:23:51Z | 2022-08-01T03:27:22Z | 2022-08-01T03:27:50Z | 2022-08-01T03:33:48Z |             |            | 2022-08-01T03:33:48Z | 597               |
      +--------------------+------------------+-------------+------------------+------------+----------------+---------+----------+-----------+------------------+----------------------+----------------------+----------------------+----------------------+-------------+------------+----------------------+-------------------+
    • 查詢指定的訓練任務。

      ./dlc get job dlc02xipvt5z****

      系統返回如下類似結果。

      {
         "ClusterId": "",
         "CodeSource": {
            "Branch": "main",
            "CodeSourceId": "code-29****c****c4****ae0c9ec75a5****",
            "MountPath": ""
         },
         "DataSources": [
            {
               "DataSourceId": "d-ya7gc2p2iqq240****",
               "MountPath": ""
            }
         ],
         "DisplayName": "test_epl_test-****",
         "Duration": 736,
         "ElasticSpec": {
            "AIMasterType": "",
            "EnableElasticTraining": false,
            "MaxParallelism": 0,
            "MinParallelism": 0
         },
         "EnabledDebugger": false,
         "GmtCreateTime": "2022-08-01T06:41:05Z",
         "GmtFinishTime": "2022-08-01T06:53:21Z",
         "GmtRunningTime": "2022-08-01T06:48:57Z",
         "GmtSubmittedTime": "2022-08-01T06:45:08Z",
         "GmtSuccessedTime": "2022-08-01T06:53:21Z",
         "JobId": "dlc02xipvt5z****",
         "JobSpecs": [
            {
               "AssignNodeSpec": {
                  "EnableAssignNode": false,
                  "NodeNames": ""
               },
               "EcsSpec": "ecs.gn6v-c8g1.2xlarge",
               "Image": "registry.cn-shanghai.aliyuncs.com/pai-dlc/tensorflow-training:1.15-gpu-py36-cu100-ubuntu1****",
               "PodCount": 2,
               "ResourceConfig": {
                  "CPU": "",
                  "GPU": "",
                  "GPUType": "",
                  "Memory": "",
                  "SharedMemory": ""
               },
               "Type": "Worker",
               "UseSpotInstance": false
            }
         ],
         "JobType": "TFJob",
         "Pods": [
            {
               "GmtCreateTime": "2022-08-01T06:45:08Z",
               "GmtFinishTime": "2022-08-01T06:53:20Z",
               "GmtStartTime": "2022-08-01T06:52:06Z",
               "Ip": "10.224.xx.xx",
               "PodId": "dlc02xipvt5z****-worker-0",
               "PodUid": "",
               "Status": "Succeeded",
               "Type": "worker"
            },
            {
               "GmtCreateTime": "2022-08-01T06:45:08Z",
               "GmtFinishTime": "2022-08-01T06:53:20Z",
               "GmtStartTime": "2022-08-01T06:48:57Z",
               "Ip": "10.224.xx.xx",
               "PodId": "dlc02xipvt5z****-worker-1",
               "PodUid": "",
               "Status": "Succeeded",
               "Type": "worker"
            }
         ],
         "ReasonCode": "JobSucceeded",
         "ReasonMessage": "TFJob dlc02xipvt5z**** successfully completed.",
         "RequestId": "76FC3500-xxxx-533F-B24A-AC9B2A72****",
         "ResourceId": "",
         "Priority": 1,
         "ResourceLevel": "",
         "Settings": {
            "BusinessUserId": "",
            "Caller": "",
            "EnableErrorMonitoringInAIMaster": false,
            "EnableTideResource": false,
            "ErrorMonitoringArgs": "",
            "PipelineId": ""
         },
         "Status": "Succeeded",
         "ThirdpartyLibDir": "",
         "UserCommand": "cd /root/xxxx/xxxx/\npip install .\ncd examples/resnet\nbash scripts/xxxx_dp.sh",
         "UserId": "144963168668****",
         "WorkspaceId": "23****",
         "WorkspaceName": "doc_test_****"
      }

相關文檔

您可以通過控制台查看任務詳情。具體操作,請參見查看訓練詳情