All Products
Search
Document Center

Container Compute Service:Use ACS to create a generative AI-powered chat application

Last Updated:Dec 25, 2024

Container Compute Service (ACS) provides container computing resources that comply with the container specifications of Kubernetes. ACS provides serverless computing resources, which allow you to run containerized applications with high efficiency. This topic describes how to deploy and expose a containerized generative AI-powered chat application in the ACS console and by using an ACS cluster certificate. This topic also describes how to monitor the application.

Background information

  • In this topic, the following open source projects are used: RWKV-Runner and ChatGPT-Next-Web. RWKV-Runner is a 0.1-billion-parameter model that provides online inference by using RESTful APIs. ChatGPT-Next-Web is a web UI for chat applications. You can use images to deploy RWKV-Runner and ChatGPT-Next-Web in an ACS cluster to build a generative AI-powered chat application on top of an architecture that decouples the frontend from the backend. After you complete the steps in this topic, a generative AI-powered chat application is created.

  • For more information about the terms used in Kubernetes, see Course jointly developed by CNCF and Alibaba Cloud on cloud-native technologies.

Procedure

If this is the first time you use ACS, you must activate ACS and grant ACS the permissions to access cloud resources. Then, you can create an ACS cluster and deploy a generative AI-powered application in the cluster.

image

Activate and grant permissions to ACS

If this is the first time you use ACS, you must activate ACS and grant ACS the permissions to access cloud resources.

  1. Log on to the ACS console and click Activate.

  2. Go to the ACS activation page and follow the on-screen instructions to activate ACS.

  3. Return to the ACS console and refresh the page. Click Authorize Now.

  4. Go to the ACS authorization page and follow the on-screen instructions to grant permissions to ACS.

    After you complete the preceding operations, refresh the ACS console. Then, you can get started with ACS.

Step 2: Create an ACS cluster.

This step shows how to configure cluster parameters when you create an ACS cluster.

  1. In the upper-left corner of the Clusters page, click Create Cluster.

  2. Log on to the ACS console. In the left-side navigation pane, click Clusters.

  3. On the Create Cluster page, set the parameters described in the following table.

    Use default settings for parameters that are not listed in the table.

    Parameter

    Description

    Example

    Cluster Name

    Enter a name for the cluster.

    ACS-Demo

    Region

    Select a region to deploy the cluster.

    China (Beijing)

    VPC

    ACS clusters can be deployed only in virtual private clouds (VPCs). You must specify a VPC in the same region as the cluster.

    Click Create VPC to create a VPC named vpc-acs-demo in the China (Beijing) region. For more information, see Create and manage a VPC.

    vpc-acs-demo

    vSwitch

    Select vSwitches for nodes in the cluster to communicate with each other.

    Click Create vSwitch and create a vSwitch named vswitch-ack-demo in the vpc-ack-demo VPC. Then, select vswitch-ack-demo in the vSwitch list. For more information, see Create and manage a vSwitch.

    vswitch-acs-demo

    API Server Access Settings

    Specify whether to expose the Kubernetes API server of the cluster to the Internet. If you want to manage the cluster over the Internet, you must expose the Kubernetes API server with an elastic IP address (EIP).

    Select Expose API Server with EIP.

    Service Discovery

    Specify whether to enable service discovery for the cluster. To enable service discovery, select CoreDNS.

    Select CoreDNS.

  4. Click Confirm Order, read and select Terms of Service, and then click Create Cluster.

    Note

    It requires approximately 10 minutes to create a cluster. After the cluster is created, you can view the cluster on the Clusters page.

Step 3: Deploy RWKV-Runner in the ACS cluster

This step shows how to deploy RWKV-Runner in the ACS cluster by creating a general-purpose Deployment and how to expose RWKV-Runner within the cluster by using RESTful APIs. For more information about the parameters used to create a Deployment, see Create a stateless application by using a Deployment.

  1. Log on to the ACS console. On the Clusters page, click the name of the cluster you created, which is ACS-Demo in this example.

  2. In the left-side navigation pane, choose Workloads > Deployments.

  3. On the Deployments page, click Create from Image.

  4. On the Basic Information wizard page, set Name to rwkv-runner, select General-purpose for Instance type, select default for QoS Type, and then click Next.

  5. On the Container wizard page, configure the container and click Next.

    Parameter

    Description

    Example

    Image Name

    You can enter an untagged image address or click Select images to select the image that you want to use.

    registry.cn-beijing.aliyuncs.com/acs-demo-ns/rwkv-runner

    Select Image Tag

    Click Select Image Version and select an image version.

    1.0.0

    CPU

    Specify the number of vCPUs required by the application.

    1 Core

    Memory

    Specify the amount of memory required by the application.

    2 GiB

    Port Number

    Configure container ports.

    • Name: runner.

    • Container Port: 8000.

    • Protocol: TCP.

  6. On the Advanced wizard page, click Create on the right side of Services.

  7. In the Create Service dialog box, configure the following parameters and click Create to expose the rwkv-runner application within the cluster by using RESTful APIs.

    Parameter

    Description

    Example

    Application Name

    Enter a name for the Service.

    rwkv-runner-svc

    Status

    The type of Service. This parameter specifies how the Service is accessed.

    Cluster IP

    Port Mapping

    Specify a Service port and a container port. The container port must be the same as the port that is exposed in the backend pod.

    • Name: runner.

    • Service Port: 80.

    • Container Port: 8000.

    • Protocol: TCP.

  8. In the lower-right corner of the Advanced wizard page, click Create.

    After you create the application, you are directed to the Complete wizard page, which displays the objects of the application. Click View Details to view the details of the application.

Step 4: Deploy ChatGPT-Next-Web by using the cluster certificate

This step shows how to use the cluster certificate to deploy ChatGPT-Next-Web in the cluster by creating a general-purpose Deployment and how to expose RWKV-Runner to the Internet. For more information about the parameters used to create a Deployment, see Create a stateless application by using a Deployment.

  1. Log on to the ACS console. On the Clusters page, click the name of the cluster you created, which is ACS-Demo in this example.

  2. On the Cluster Information page, click the Connection Information tab. Obtain the cluster certificate for Internet access and follow the on-screen instructions to save the certificate to your on-premises machine.

  3. Create a file named chat-next-web.yaml and copy the following content to the file.

    Show the content of the chat-next-web.yaml file

    apiVersion: v1
    kind: Service
    metadata:
      name: chat-frontend-svc
    spec:
      ports:
        - name: chat
          port: 80
          protocol: TCP
          targetPort: 3000
      selector:
        app: chat-frontend
      type: LoadBalancer
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: chat-frontend
      name: chat-frontend
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: chat-frontend
      strategy:
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%
        type: RollingUpdate
      template:
        metadata:
          labels:
            alibabacloud.com/compute-class: general-purpose  # The general-purpose type.
            #If you want to use the performance-enhanced type, specify alibabacloud.com/compute-class: performance.
            app: chat-frontend
        spec:
          containers:
            - env:
                - name: BASE_URL
                  value: 'http://rwkv-runner-svc'
              image: registry.cn-beijing.aliyuncs.com/acs-demo-ns/chatgpt-next-web:amd64
              imagePullPolicy: IfNotPresent
              name: chat-frontend
              ports:
                - containerPort: 3000
                  protocol: TCP
              resources:
                requests:
                  cpu: "1"
                  memory: 2Gi
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Always
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
  4. Run the following command to create the preceding resources in the cluster:

    kubectl apply -f chat-next-web.yaml

Step 5: Use the cluster certificate to create an initialization Job for the application

This step shows how to use the cluster certificate to create an initialization Job for the RWKV-Runner model. The QoS class of the pods created by the Job is BestEffort. For more information about the parameters used to create a Job, see Create a Job.

  1. Create a file named rwkv-init-job.yaml and copy the following content to the file.

    Show the content of the rwkv-init-job.yaml file

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo
    spec:
      activeDeadlineSeconds: 600
      backoffLimit: 6
      completionMode: NonIndexed
      completions: 1
      parallelism: 1
      suspend: false
      template:
        metadata:
          labels:
            alibabacloud.com/compute-qos: best-effort # The BestEffort QoS class.
            #If you want to use the default QoS class, specify alibabacloud.com/compute-qos: default.
            app: job-demo
        spec:
          containers:
          - name: job
            image: registry.cn-beijing.aliyuncs.com/acs-demo-ns/rwkv-init-job:1.0.0
            imagePullPolicy: Always
            resources:
              requests:
                cpu: 500m
                memory: 1Gi
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
    
  2. Run the following command to deploy the initialization Job:

    kubectl apply -f rwkv-init-job.yaml
  3. Run the following command to check whether the initialization Job is completed:

    kubectl get pod

    Expected output:

Step 6: Test the application

This step shows how to access the application by using the Service.

  1. Log on to the ACS console. On the Clusters page, click the name of the cluster you created, which is ACS-Demo in this example.

  2. In the left-side navigation pane, choose Network > Services.

  3. On the Services page, find the Service you created, which is chat-frontend-svc in this example. Click the IP address in the External IP column to access the application.

Release resources

When you use an ACS cluster, you are charged the following fees:

  • The fee for the computing power used by the workloads in the cluster. The fee is charged by ACS.

  • The fees for other cloud resources used by the cluster. The fees are charged by Alibaba Cloud services based on their billing rules.

Take note of the following items after you create an ACS cluster.

  • If you no longer need to use the cluster, delete the cluster and relevant resources. For more information, see Delete an ACS cluster.

  • If you need to keep the cluster, top up your account once your account balance is less than CNY 100. For more information about the billing rules of Alibaba Cloud services used by ACS, see Billing.