All Products
Search
Document Center

Container Compute Service:Build a distributed inference service for the full-capability DeepSeek-R1 model with ACS GPU compute

Last Updated:Jan 26, 2026

Container Compute Service (ACS) provides a serverless, out-of-the-box experience, freeing you from managing the underlying hardware or GPU node configurations. With its simple deployment and pay-as-you-go billing model, ACS is ideal for Large Language Model (LLM) inference tasks and significantly reduces inference costs. The DeepSeek-R1 model's vast parameter count makes it too large to load or run efficiently on a single GPU. Therefore, a distributed deployment across two or more container instances is recommended to run inference on such large models, improve throughput, and ensure performance. This topic describes how to use ACS to deploy a production-ready, full-capability, distributed DeepSeek-R1 inference service.

Background

DeepSeek-R1

DeepSeek-R1 is the first-generation inference model provided by DeepSeek. It is intended to improve the inference performance of LLMs through large-scale enhanced learning. Statistics show that DeepSeek-R1 outperforms other closed source models in mathematical inference and programming competitions. Its performance even reaches or surpasses the OpenAI-01 series in certain sectors. The performance of DeepSeek-R1 is also stunning in sectors related to knowledge, such as creation, writing, and Q&A. For more information about DeepSeek, see DeepSeek AI GitHub repository.

vLLM

vLLM is a high-performance and easy-to-use LLM inference service framework. vLLM supports most commonly used LLMs, including the Qwen series of models. vLLM is powered by technologies such as PagedAttention optimization, continuous batching, and model quantification to greatly improve the inference efficiency of LLMs. See vLLM GitHub repository.

ACS

ACS was released in 2023. ACS focuses on consistently delivering inclusive, easy-to-use, elastic, and flexible next-generation container compute power. ACS provides general-purpose and heterogeneous compute power that complies with Kubernetes specifications. It provides serverless container compute resources and eliminates the need to worry about node and cluster O&M. You can integrate scheduling, container runtime, storage, and networking capabilities with ACS to reduce the O&M complexity of Kubernetes and improve the elasticity and flexibility of container compute power. With the pay-as-you-go billing method, elastic instances, and flexible capabilities, ACS can greatly reduce the resource cost. In LLM inference scenarios, ACS can accelerate data and image loading to further reduce the model launch time and resource cost.

LeaderWorkerSet (LWS)

LWS is a new workload type proposed by a Special Interest Group (SIG) in the Kubernetes community. Unlike native Kubernetes workloads like Deployments and StatefulSets, LWS treats a group of pods, rather than a single pod, as a single replica. Scaling up a replica typically means scaling up multiple pods at once, and within a single replica, there is a leader-worker relationship between the pods (Leader Pod and Worker Pods). This workload is well-suited for multi-node distributed inference tasks in AI/ML. For more information about LWS, see the LWS GitHub repository.

Fluid

Fluid manages and schedules JindoRuntime to achieve dataset visibility, elastic scaling, and data migration. You can use Fluid to accelerate model access. Its core principle is to use caching to overcome the access speed bottleneck when working with large models. For example, if 10 inference service instances start simultaneously and need to pull data from Object Storage Service (OSS), the total available bandwidth is fixed. Each instance receives only one-tenth of the total bandwidth, which significantly increases data-pull latency and degrades performance. To elastically scale the distributed cache on the compute side, you can extend the limited bandwidth of the underlying storage system into the ACS cluster. This makes the available bandwidth within the cluster dependent on the number of distributed cache nodes, allowing you to scale flexibly based on business needs and significantly reduce model loading time.

Solution overview

Model partitioning

The DeepSeek-R1 model has 671 billion parameters. A single GPU typically has a maximum of 96 GiB of memory and cannot load the entire model. Therefore, the model must be partitioned. This topic uses a distributed deployment on two GPU container instances, adopting a partitioning strategy of pipeline parallelism (PP=2) and tensor parallelism (TP=8). The model partitioning is shown in the following figure.

image

Pipeline parallelism (PP=2) partitions the model into two stages, with each stage running on a separate GPU container instance. For example, a model M can be partitioned into M1 and M2. M1 processes the input on the first instance and passes the intermediate results to M2 for subsequent operations on the second instance.

Tensor parallelism (TP=8) distributes the computational operations within each stage of the model (such as M1 and M2) across 8 GPUs. For example, in the M1 stage, when input data arrives, it is split into 8 parts and processed simultaneously on 8 GPUs. Each GPU processes a small portion of the data, and the results are then combined.

Distributed deployment architecture

This solution uses ACS to deploy a distributed, full-capability DeepSeek-R1 inference service via vLLM and Ray. It employs LWS to manage the Leader-Worker deployment and Fluid for distributed caching to accelerate model loading. vLLM runs on two 8-GPU Pods, each functioning as a Ray Group (Head and Workers) to improve throughput. Note that architecture changes affect YAML variables like tensor-parallel-size and LWS_GROUP_SIZE.

image

Prerequisites

GPU instance specifications and cost estimation

For a dual-instance or multi-instance deployment on ACS, a single instance with 96 GiB of GPU memory is recommended: GPU: 8 cards (96 GiB memory per card), CPU: 64 vCPUs, Memory: 512 GiB. Refer to the Recommended instance types and GPU-accelerated compute instance types to select a suitable instance type. To calculate ACS GPU instance costs, see Billing.

Note
  • ACS GPU instance specifications also follow the ACS pod specification normalization logic.

  • By default, ACS Pods provide 30 GiB of free temporary storage (EphemeralStorage). The inference image registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/vllm:v0.7.2 used in this topic occupies about 9.5 GiB. If this storage size does not meet your needs, you can customize it. For more information, see Increase the size of temporary storage space.

Procedure

Step 1: Prepare the DeepSeek-R1 model files

Due to their massive parameter counts, LLMs require significant disk space for model files. We recommend that you create a NAS or OSS volume for persistent storage of model files. This topic uses OSS as an example.

Note

Downloading and uploading model files can be slow. You can submit a ticket to quickly copy the model files to your OSS bucket.

  1. Run the following commands to download the DeepSeek-R1 model from ModelScope.

    Note

    Make sure you have the git-lfs plugin installed. Install it by running yum install git-lfs or apt-get install git-lfs. For more installation methods, see Install git-lfs.

    git lfs install
    GIT_LFS_SKIP_SMUDGE=1 git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1.git
    cd DeepSeek-R1/
    git lfs pull
  2. Create a directory in OSS and upload the model to OSS.

    Note

    For more information about how to install and use ossutil, see Install ossutil.

    ossutil mkdir oss://<your-bucket-name>/models/DeepSeek-R1
    ossutil cp -r ./DeepSeek-R1 oss://<your-bucket-name>/models/DeepSeek-R1
  3. After storing the model in OSS, you have two ways to load it.

    1. Directly mount the model by using a PVC and PV: This method is best for smaller models and applications without strict requirements for pod startup or model loading speed.

      Console

      The following table describes the basic configuration of an example PV:

      Configuration item

      Description

      Volume type

      OSS

      Name

      llm-model

      Access certificate

      Configure the AccessKey ID and AccessKey secret for accessing OSS.

      Bucket ID

      Select the OSS bucket created in the previous step.

      OSS Path

      Select the path where the model is located, such as /models/DeepSeek-R1.

      The following table describes the basic configuration of an example PVC:

      Configuration item

      Description

      Persistent Volume Claim (PVC) type

      OSS

      Name

      llm-model

      Allocation mode

      Select an existing volume.

      Existing volume

      Click the Select an existing PV link and select the created PV.

      kubectl

      The following YAML is an example:

      apiVersion: v1
      kind: Secret
      metadata:
        name: oss-secret
      stringData:
        akId: <your-oss-ak> # The AccessKey ID for accessing OSS.
        akSecret: <your-oss-sk> # The AccessKey secret for accessing OSS.
      ---
      apiVersion: v1
      kind: PersistentVolume
      metadata:
        name: llm-model
        labels:
          alicloud-pvname: llm-model
      spec:
        capacity:
          storage: 30Gi 
        accessModes:
          - ReadOnlyMany
        persistentVolumeReclaimPolicy: Retain
        csi:
          driver: ossplugin.csi.alibabacloud.com
          volumeHandle: llm-model
          nodePublishSecretRef:
            name: oss-secret
            namespace: default
          volumeAttributes:
            bucket: <your-bucket-name> # The bucket name.
            url: <your-bucket-endpoint> # The endpoint, such as oss-cn-hangzhou-internal.aliyuncs.com.
            otherOpts: "-o umask=022 -o max_stat_cache_size=0 -o allow_other"
            path: <your-model-path> # In this example, /models/DeepSeek-R1/.
      ---
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: llm-model
      spec:
        accessModes:
          - ReadOnlyMany
        resources:
          requests:
            storage: 30Gi
        selector:
          matchLabels:
            alicloud-pvname: llm-model
    2. Accelerate model loading by using Fluid: This method is suitable for larger models and applications with requirements for pod startup and model loading speed. For more information, see Use Fluid to accelerate data access.

      1. In the ACS App Marketplace, install the ack-fluid component by using Helm. The component version must be 1.0.11-* or later. For more information, see Use Helm to create an application.

      2. You can enable privileged mode for the ACS Pod by submitting a ticket.

      3. Create a Secret to access OSS.

        apiVersion: v1
        kind: Secret
        metadata:
          name: mysecret
        stringData:
          fs.oss.accessKeyId: xxx
          fs.oss.accessKeySecret: xxx

        In the preceding code, fs.oss.accessKeyId and fs.oss.accessKeySecret are the AccessKey ID and AccessKey Secret used to access the OSS bucket.

      4. Create a Dataset and a JindoRuntime.

        apiVersion: data.fluid.io/v1alpha1
        kind: Dataset
        metadata:
          name: deepseek
        spec:
          mounts:
            - mountPoint:  oss://<your-bucket-name>       # Replace <your-bucket-name> with the actual value.
              options:
                fs.oss.endpoint: <your-bucket-endpoint>    # Replace <your-bucket-endpoint> with the actual value.
              name: deepseek
              path: "/"
              encryptOptions:
                - name: fs.oss.accessKeyId
                  valueFrom:
                    secretKeyRef:
                      name: mysecret
                      key: fs.oss.accessKeyId
                - name: fs.oss.accessKeySecret
                  valueFrom:
                    secretKeyRef:
                      name: mysecret
                      key: fs.oss.accessKeySecret
        ---
        apiVersion: data.fluid.io/v1alpha1
        kind: JindoRuntime
        metadata:
          name: deepseek
        spec:
          replicas: 16    # Adjust as needed.
          master:
            podMetadata:
              labels:
                alibabacloud.com/compute-class: performance
                alibabacloud.com/compute-qos: default
          worker:
            podMetadata:
              labels:
                alibabacloud.com/compute-class: performance
                alibabacloud.com/compute-qos: default
              annotations:
                kubernetes.io/resource-type: serverless
            resources:
              requests:
                cpu: 16
                memory: 128Gi
              limits:
                cpu: 16
                memory: 128Gi
          tieredstore:
            levels:
              - mediumtype: MEM
                path: /dev/shm
                volumeType: emptyDir
                ## Adjust as needed.
                quota: 128Gi
                high: "0.99"
                low: "0.95"

        After the resources are created, run the kubectl get pod | grep jindo command to check if the pods are in the Running state. Expected output:

        deepseek-jindofs-master-0    1/1     Running   0          3m29s
        deepseek-jindofs-worker-0    1/1     Running   0          2m52s
        deepseek-jindofs-worker-1    1/1     Running   0          2m52s
        ...
      5. Cache the model by creating a DataLoad.

        apiVersion: data.fluid.io/v1alpha1
        kind: DataLoad
        metadata:
          name: deepseek
        spec:
          dataset:
            name: deepseek
            namespace: default
          loadMetadata: true
      6. Run the following command to check the cache status.

        kubectl get dataload

        Expected output:

        NAME       DATASET    PHASE       AGE     DURATION
        deepseek   deepseek   Executing   4m30s   Unfinished

        A PHASE of Executing indicates that the process is ongoing. Wait for about 20 minutes and run the command again. If the status changes to Complete, the caching is successful. Use the command kubectl logs $(kubectl get pods --selector=job-name=deepseek-loader-job -o jsonpath='{.items[0].metadata.name}') | grep progress to get the job name and view the logs to check the progress.

        Fluid DataLoad resource parameters

        Parameter

        Description

        Example value

        Name

        The name of the data loading task.

        deepseek

        Dataset

        The name of the associated dataset.

        deepseek

        Phase

        The status of the data loading task (for example, Complete means finished).

        Executing, Complete

        Age

        The time since the data loading task was created.

        4m30s

        Duration

        The time taken for the data loading task.

        Unfinished, 16m29s

      7. Run the following command to check the Dataset resource.

        kubectl get datasets

        Expected output:

        NAME       UFS TOTAL SIZE   CACHED    CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
        deepseek   1.25TiB          1.25TiB   2.00TiB          100.0%              Bound   21h

        Fluid Dataset resource parameters

        Parameter

        Description

        Example value

        Name

        The name of the dataset.

        deepseek

        UFS Total Size

        The total size of the dataset in the underlying storage.

        1.25TiB

        Cached

        The amount of data currently cached.

        1.25TiB

        Cache Capacity

        The total capacity of the cache.

        2.00TiB

        Cached %

        The percentage of data that is cached.

        100.0%

        Phase

        The status of the dataset (for example, Bound indicates it is bound).

        Bound

        Age

        The time since the dataset resource was created.

        21h

Step 2: Deploy the model by using ACS GPU compute

  1. In the ACS App Marketplace, install the lws component by using Helm. For more information, see Use Helm to create an application.

  2. Deploy the model by using a LeaderWorkerSet.

    Note
    • Replace alibabacloud.com/gpu-model-series: <example-model> in the YAML file with a specific GPU model supported by ACS. For a list of currently supported GPU models, consult your account manager or submit a ticket.

    • Compared to TCP/IP, high-performance RDMA networking features zero-copy and kernel bypass to avoid data copying and frequent context switching. These features result in lower latency, higher throughput, and lower CPU usage. ACS supports the use of RDMA by configuring the label alibabacloud.com/hpn-type: "rdma" in the YAML file. For a list of GPU models that support RDMA, consult your account manager or submit a ticket.

    • If you use Fluid to load the model, you must change the claimName of both PVCs to the name of the Fluid Dataset.

    • Different distributed deployment architectures will affect the values of variables such as tensor-parallel-size and LWS_GROUP_SIZE in the YAML file.

    Standard deployment

    apiVersion: leaderworkerset.x-k8s.io/v1
    kind: LeaderWorkerSet
    metadata:
      name: deepseek-r1-671b-fp8-distrubution
    spec:
      replicas: 1
      leaderWorkerTemplate:
        size: 2 # The total number of leaders and workers.
        restartPolicy: RecreateGroupOnPodRestart
        leaderTemplate:
          metadata:
            labels: 
              role: leader
              alibabacloud.com/compute-class: gpu  # Specify the GPU type.
              alibabacloud.com/compute-qos: default # Specify the ACS QoS level.
              alibabacloud.com/gpu-model-series: <example-model> ## Specify the GPU model.
          spec:
            volumes:
              - name: llm-model
                persistentVolumeClaim:
                  ## If you use Fluid, enter the Fluid dataset name here, for example: deepseek
                  claimName: llm-model
              - name: shm
                emptyDir:
                  medium: Memory
                  sizeLimit: 32Gi
            containers:
              - name: deepseek-r1-671b-leader
                image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/vllm:v0.7.2
                env:
                  - name: NCCL_SOCKET_IFNAME # Specify the network interface card.
                    value: eth0
                command:
                  - sh
                  - -c
                  - "/vllm-workspace/ray_init.sh leader --ray_cluster_size=$(LWS_GROUP_SIZE);vllm serve /models/DeepSeek-R1/ --port 8000 --trust-remote-code --served-model-name ds --max-model-len 2048 --gpu-memory-utilization 0.95 --tensor-parallel-size 8 --pipeline-parallel-size 2 --enforce-eager"
    # Set tensor-parallel-size to the total number of cards in each leader and worker pod.
                resources:
                  limits:
                    nvidia.com/gpu: "8"
                    cpu: "64"
                    memory: 512G
                  requests:
                    nvidia.com/gpu: "8"
                    cpu: "64"
                    memory: 512G           
                ports:
                  - containerPort: 8000
                volumeMounts:
                  - mountPath: /models/DeepSeek-R1
                    name: llm-model
                  - mountPath: /dev/shm
                    name: shm
        workerTemplate:
          metadata:
            labels: 
              alibabacloud.com/compute-class: gpu  # Specify the GPU type.
              alibabacloud.com/compute-qos: default # Specify the ACS QoS level.
              alibabacloud.com/gpu-model-series: <example-model> ## Specify the GPU model.
          spec:
            volumes:
              - name: llm-model
                persistentVolumeClaim:
                  ## If you use Fluid, enter the Fluid dataset name here, for example: deepseek
                  claimName: llm-model
              - name: shm
                emptyDir:
                  medium: Memory
                  sizeLimit: 32Gi
            containers:
              - name: deepseek-r1-671b-worker
                image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/vllm:v0.7.2
                env:
                  - name: NCCL_SOCKET_IFNAME # Specify the network interface card.
                    value: eth0
                command:
                  - sh
                  - -c
                  - "/vllm-workspace/ray_init.sh worker --ray_address=$(LWS_LEADER_ADDRESS)"
                resources:
                  limits:
                    nvidia.com/gpu: "8"
                    cpu: "64"
                    memory: 512G
                  requests:
                    nvidia.com/gpu: "8"
                    cpu: "64"
                    memory: 512G
                ports:
                  - containerPort: 8000
                volumeMounts:
                  - mountPath: /models/DeepSeek-R1
                    name: llm-model
                  - mountPath: /dev/shm
                    name: shm

    RDMA acceleration

    When using an open-source base image (such as vLLM), add the following environment variables to the YAML file:

    Name

    Value

    NCCL_SOCKET_IFNAME

    eth0

    NCCL_IB_TC

    136

    NCCL_IB_SL

    5

    NCCL_IB_GID_INDEX

    3

    NCCL_DEBUG

    INFO

    NCCL_IB_HCA

    mlx5

    NCCL_NET_PLUGIN

    none

    apiVersion: leaderworkerset.x-k8s.io/v1
    kind: LeaderWorkerSet
    metadata:
      name: deepseek-r1-671b-fp8-distrubution
    spec:
      replicas: 1
      leaderWorkerTemplate:
        size: 2 # The total number of leaders and workers.
        restartPolicy: RecreateGroupOnPodRestart
        leaderTemplate:
          metadata:
            labels: 
              role: leader
              alibabacloud.com/compute-class: gpu  # Specify the GPU type.
              alibabacloud.com/compute-qos: default # Specify the ACS QoS level.
              alibabacloud.com/gpu-model-series: <example-model> ## Specify the GPU model.
              # Specify that the application runs in a high-performance RDMA network. Submit a ticket for a list of supported GPU models.
              alibabacloud.com/hpn-type: "rdma"
          spec:
            volumes:
              - name: llm-model
                persistentVolumeClaim:
                  ## If you use Fluid, enter the Fluid dataset name here, for example: deepseek
                  claimName: llm-model
              - name: shm
                emptyDir:
                  medium: Memory
                  sizeLimit: 32Gi
            containers:
              - name: deepseek-r1-671b-leader
                image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/vllm:v0.7.2
                env:
                  - name: NCCL_SOCKET_IFNAME # Specify the network interface card.
                    value: eth0
                  - name: NCCL_IB_TC
                    value: "136"
                  - name: NCCL_IB_SL
                    value: "5"
                  - name: NCCL_IB_GID_INDEX
                    value: "3"
                  - name: NCCL_DEBUG
                    value: "INFO"
                  - name: NCCL_IB_HCA
                    value: "mlx5"
                  - name: NCCL_NET_PLUGIN
                    value: "none"                
                command:
                  - sh
                  - -c
                  - "/vllm-workspace/ray_init.sh leader --ray_cluster_size=$(LWS_GROUP_SIZE);vllm serve /models/DeepSeek-R1/ --port 8000 --trust-remote-code --served-model-name ds --max-model-len 2048 --gpu-memory-utilization 0.95 --tensor-parallel-size 8 --pipeline-parallel-size 2 --enforce-eager"
    # Set tensor-parallel-size to the total number of cards in each leader and worker pod.
                resources:
                  limits:
                    nvidia.com/gpu: "8"
                    cpu: "64"
                    memory: 512G
                  requests:
                    nvidia.com/gpu: "8"
                    cpu: "64"
                    memory: 512G           
                ports:
                  - containerPort: 8000
                volumeMounts:
                  - mountPath: /models/DeepSeek-R1
                    name: llm-model
                  - mountPath: /dev/shm
                    name: shm
        workerTemplate:
          metadata:
            labels: 
              alibabacloud.com/compute-class: gpu  # Specify the GPU type.
              alibabacloud.com/compute-qos: default # Specify the ACS QoS level.
              alibabacloud.com/gpu-model-series: <example-model> ## Specify the GPU model.
              # Specify that the application runs in a high-performance RDMA network. Submit a ticket for a list of supported GPU models.
              alibabacloud.com/hpn-type: "rdma"
          spec:
            volumes:
              - name: llm-model
                persistentVolumeClaim:
                  ## If you use Fluid, enter the Fluid dataset name here, for example: deepseek
                  claimName: llm-model
              - name: shm
                emptyDir:
                  medium: Memory
                  sizeLimit: 32Gi
            containers:
              - name: deepseek-r1-671b-worker
                image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/vllm:v0.7.2
                env:
                  - name: NCCL_SOCKET_IFNAME # Specify the network interface card.
                    value: eth0
                  - name: NCCL_IB_TC
                    value: "136"
                  - name: NCCL_IB_SL
                    value: "5"
                  - name: NCCL_IB_GID_INDEX
                    value: "3"
                  - name: NCCL_DEBUG
                    value: "INFO"
                  - name: NCCL_IB_HCA
                    value: "mlx5"
                  - name: NCCL_NET_PLUGIN
                    value: "none"      
                command:
                  - sh
                  - -c
                  - "/vllm-workspace/ray_init.sh worker --ray_address=$(LWS_LEADER_ADDRESS)"
                resources:
                  limits:
                    nvidia.com/gpu: "8"
                    cpu: "64"
                    memory: 512G
                  requests:
                    nvidia.com/gpu: "8"
                    cpu: "64"
                    memory: 512G
                ports:
                  - containerPort: 8000
                volumeMounts:
                  - mountPath: /models/DeepSeek-R1
                    name: llm-model
                  - mountPath: /dev/shm
                    name: shm
  3. Expose the inference service by using a Service.

    apiVersion: v1
    kind: Service
    metadata:
      name: ds-leader
    spec:
      ports:
        - name: http
          port: 8000
          protocol: TCP
          targetPort: 8000
      selector:
        leaderworkerset.sigs.k8s.io/name: deepseek-r1-671b-fp8-distrubution
        role: leader
      type: ClusterIP

Step 3: Verify the inference service

  1. Use kubectl port-forward to establish port forwarding between the inference service and your local environment.

    Note

    The port forwarding established by kubectl port-forward is not suitable for production due to its lack of reliability, security, and scalability. Therefore, use it only for development and debugging. For more information about production-ready networking solutions in Kubernetes clusters, see Ingress management.

    kubectl port-forward svc/ds-leader 8000:8000

    Expected output:

    Forwarding from 127.0.0.1:8000 -> 8000
    Forwarding from [::1]:8000 -> 8000
  2. Send an inference request to the model.

    curl http://localhost:8000/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "ds",
        "messages": [
          {
            "role": "system", 
            "content": "You are a friendly AI assistant."
          },
          {
            "role": "user",
            "content": "Tell me about deep learning."
          }
        ],
        "max_tokens": 1024,
        "temperature": 0.7,
        "top_p": 0.9,
        "seed": 10
      }'

    Expected output:

    {"id":"chatcmpl-4bc78b66e2a4439f8362bd434a60be57","object":"chat.completion","created":1739501401,"model":"ds","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Okay, the user wants me to explain deep learning. I need to think about how to answer this well. First, I need to clarify the basic definition of deep learning. It's a branch of machine learning, right? Then I should compare it with traditional machine learning methods to explain its advantages, such as automatic feature extraction. I might need to mention neural networks, especially the structure of deep neural networks with multiple hidden layers.\n\nNext, I should talk about the core components of deep learning, such as activation functions, loss functions, and optimizers. The user might not be familiar with these terms, so I should briefly explain the role of each part. For example, ReLU as an activation function, Adam as an optimizer, and examples like the cross-entropy loss function.\n\nThen, for application areas, computer vision and natural language processing are common. I should provide some practical examples like image recognition and machine translation to make it easier for the user to understand. For industry applications, like healthcare and finance, the user might be interested in these real-world use cases.\n\nI also need to mention popular frameworks like TensorFlow and PyTorch, which make deep learning easier to implement. The importance of hardware acceleration, such as GPUs, is also key to explaining why deep learning is advancing so quickly.\n\nI should also discuss the challenges and limitations of deep learning, such as data dependency, high computational resource requirements, and poor interpretability. This will give the user a balanced view of its pros and cons. I might also mention future development directions, like efficient training algorithms and research into interpretability.\n\nThe user probably wants to understand the basic concepts of deep learning. They might have some technical background but not in-depth knowledge. They likely want to quickly grasp the key points and applications, so the answer needs to be well-structured and focused, without getting too deep into technical details but also not being too brief. I need to balance professionalism with ease of understanding.\n\nI should avoid using too much jargon, or explain terms when I use them, such as 'neural network' or 'convolutional neural network'. The user could be a student or a newcomer to the field, so I should use plain language. I should also use examples to connect the concepts to real-world applications to make them more memorable.\n\nI also need to be clear about the relationship between deep learning and machine learning, explaining that deep learning is a subset of machine learning but operates at a deeper level to handle more complex problems. I might also mention the backpropagation algorithm as one of the key training techniques.\n\nFinally, I'll provide a summary that emphasizes the impact and potential of deep learning and its future directions. This will give the user a comprehensive understanding. I'll double-check if I've missed any important points, like common model architectures such as CNNs and RNNs, which I should briefly mention to show the diversity.\n\nSometimes users are interested in the principles, but it's better to keep it concise here, focusing on an overview rather than in-depth technical details. I need to ensure the answer flows logically, starting from the definition, then moving to core components, applications, frameworks, challenges, and future directions. That structure seems reasonable.\n</think>\n\nDeep learning is a branch of machine learning that aims to simulate the human brain's learning process by building multi-layer neural networks (known as 'deep' networks). By automatically learning complex features and patterns from large amounts of data, it is widely used in fields such as image recognition, speech processing, and natural language processing.\n\n### Core concepts\n1. **Artificial Neural Network (ANN)**:\n   - Consists of an input layer, multiple hidden layers, and an output layer, with each layer containing multiple neurons.\n   - Processes information by simulating the activation and transmission of signals between neurons.\n\n2. **Automatic feature extraction**:\n   - Traditional machine learning relies on manually designed features. Deep learning, through its multi-layer networks, automatically extracts abstract features from data, such as edges and shapes from pixels in an image.\n\n3. **Key components**:\n   - **Activation function** (such as ReLU, Sigmoid): Introduces non-linearity, enhancing the model's expressive power.\n   - **Loss function** (such as cross-entropy, mean squared error): Measures the difference between the predicted output and the actual value.\n   - **Optimizer** (such as SGD, Adam): Optimizes the network's parameters through backward propagation to minimize the loss.\n\n---\n\n### Typical models\n- **Convolutional Neural Network (CNN)**:  \n  Designed specifically for images, it uses convolutional kernels to extract spatial features. Classic models include ResNet and VGG.\n- **Recurrent Neural Network (RNN)**:  \n  Processes sequential data like text and speech by introducing a memory mechanism. Improved versions include LSTM and GRU.\n- **Transformer**:  \n  Based on a self-attention mechanism, it has significantly improved performance in natural language processing tasks. Examples include the BERT and GPT series.\n\n---\n\n### Application scenarios\n- **Computer vision**: Facial recognition, medical imaging analysis (such as detecting lesions in lung CT scans).\n- **Natural language processing**: Intelligent chatbots, document summary generation, and translation (such as DeepL).\n- **Speech technology**: Voice assistants (such as Siri) and real-time caption generation.\n- **Reinforcement learning**: Game AI (AlphaGo) and robot control.\n\n---\n\n### Advantages and challenges\n- **Advantages**:\n  - Automatically learns complex features, reducing the need for manual intervention.\n  - Far outperforms traditional methods when given large amounts of data and high computing power.\n- **Challenges**:\n  - Relies on massive amounts of labeled data (for example, tens of thousands of labeled medical images).\n  - High model training costs (for example, training GPT-3 cost over ten million USD).\n  - Its 'black box' nature leads to poor interpretability, limiting its application in high-risk fields like medicine.\n\n---\n\n### Tools and trends\n- **Mainstream frameworks**: TensorFlow (friendly for industrial deployment) and PyTorch (preferred for research).\n- **Research directions**:\n  - Lightweight models (such as MobileNet for mobile devices).\n  - Self-supervised learning (to reduce dependency on labeled data).\n  - Enhanced interpretability (such as visualizing the model's decision-making basis).\n\nDeep learning is pushing the boundaries of artificial intelligence. From generative AI (such as Stable Diffusion generating images) to autonomous driving, it continues to transform the technology ecosystem. Future developments may bring breakthroughs in reducing computational costs, improving efficiency, and enhancing interpretability.","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":17,"total_tokens":1131,"completion_tokens":1114,"prompt_tokens_details":null},"prompt_logprobs":null}

    Model parameters

    This topic only describes the parameters used for verification. For more parameters, see the DeepSeek API documentation.

    Parameter

    Description

    Example value

    model

    The model to use.

    ds

    messages

    A list of messages that form the conversation.

    • role: The role of the message initiator.

    • content: The content of the message.

    -

    max_tokens

    The maximum number of tokens the model can generate in one request.

    • Range: 1 to 8192.

    • Default: 4096 (if unspecified).

    1024

    temperature

    The sampling temperature. A higher value (such as 1) makes the output more random, while a lower value (such as 0.2) makes it more focused and deterministic. Value: between 0 and 2.

    0.7

References