All Products
Search
Document Center

Container Service for Kubernetes:Deploy a Stable Diffusion Service based on Knative

Last Updated:Jun 04, 2024

The Knative solution allows you to deploy serverless AI-generated content (AIGC) applications and accurately process concurrent requests to perform auto scaling. This topic describes how to deploy a Stable Diffusion Service in a production environment based on Knative.

Table of contents

Prerequisites

Background information

Important

Alibaba Cloud does not guarantee the legitimacy, security, or accuracy of the third-party model Stable Diffusion. Alibaba Cloud shall not be held liable for any damages caused by the use of Stable Diffusion.

You must abide by the user agreements, usage specifications, and relevant laws and regulations of Stable Diffusion. You shall bear all consequences resulting from the legitimacy and compliance requirements of Stable Diffusion.

With the development of generative AI technologies, an increasing number of developers attempt to improve R&D efficiency by using AI models. As a well-known AIGC project, Stable Diffusion can help users quickly and accurately generate desired scenes and pictures. However, you may face the following challenges when you use Stable Diffusion:

  • The maximum throughput of a pod is limited. If an excessive number of requests are forwarded to the same pod, the pod becomes overloaded. To resolve this issue, you must limit the maximum number of concurrent requests that a pod can process.

  • GPU resources are pricy and you want to use GPUs on demand and release GPUs during off-peak hours.

To resolve the preceding issues, ACK is integrated with Knative to support accurate concurrent request processing and auto scaling. This allows you to deploy a Stable Diffusion Service in a production environment based on Knative. The following figure shows how to deploy a Stable Diffusion Service in a production environment based on Knative.

image.png

Step 1: Deploy a Stable Diffusion Service

  1. Log on to the ACK console. In the left-side navigation pane, click Cluster.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Applications > Knative.

  3. On the Services tab of the Knative page, select default from the Namespace drop-down list and click Create from Template. Copy the following YAML template to the code editor and click Create to create a Service named knative-sd-demo.

    apiVersion: serving.knative.dev/v1
    kind: Service
    metadata:
      name: knative-sd-demo
      annotations:
        serving.knative.dev.alibabacloud/affinity: "cookie"
        serving.knative.dev.alibabacloud/cookie-name: "sd"
        serving.knative.dev.alibabacloud/cookie-timeout: "1800"
    spec:
      template:
        metadata:
          annotations:
            autoscaling.knative.dev/class: mpa.autoscaling.knative.dev
            autoscaling.knative.dev/maxScale: '10'
            autoscaling.knative.dev/targetUtilizationPercentage: "100"
            k8s.aliyun.com/eci-use-specs: ecs.gn5-c4g1.xlarge,ecs.gn5i-c8g1.2xlarge,ecs.gn5-c8g1.2xlarge  
        spec:
          containerConcurrency: 1
          containers:
          - args:
            - --listen
            - --skip-torch-cuda-test
            - --api
            command:
            - python3
            - launch.py
            image: yunqi-registry.cn-shanghai.cr.aliyuncs.com/lab/stable-diffusion@sha256:62b3228f4b02d9e89e221abe6f1731498a894b042925ab8d4326a571b3e992bc
            imagePullPolicy: IfNotPresent
            ports:
            - containerPort: 7860
              name: http1
              protocol: TCP
            name: stable-diffusion
            readinessProbe:
              tcpSocket:
                port: 7860
              initialDelaySeconds: 5
              periodSeconds: 1
              failureThreshold: 3

    If the Service is in the following state, the knative-sd-demo Service is created.

    image.png

Step 2: Access the Stable Diffusion Service

  1. On the Services tab, record the gateway IP address and default domain name of the Service.

    image.png

  2. Add the following information to the hosts file to map the domain name of the Service to the IP address of the gateway for the knative-sd-demo Service. Example:

    47.xx.xxx.xx knative-sd-demo.default.example.com # Replace the IP address and domain name with the actual values.

  3. After you modify the hosts file, go to the Services tab, click the default domain name of the knative-sd-demo Stable Diffusion Service to access the Service.

    If the following page appears, the access is successful.

    image.png

Step 3: Enable auto scaling based on requests

  1. Use the load testing tool hey to perform stress tests.

    Note

    For more information about hey, see hey.

    # Send 50 requests with 5 concurrent requests in each batch and set the timeout period to 180 seconds. 
    ./hey -n 50 -c 5 -t 180 -m POST -T "application/json"  -d '{"prompt": "pretty dog"}' http://knative-sd-demo.default.example.com/sdapi/v1/txt2img

    Expected output:

    Summary:
      Total:	252.1749 secs
      Slowest:	62.4155 secs
      Fastest:	9.9399 secs
      Average:	23.9748 secs
      Requests/sec:	0.1983
    
    
    Response time histogram:
      9.940 [1]	|■■
      15.187 [17]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
      20.435 [9]	|■■■■■■■■■■■■■■■■■■■■■
      25.683 [11]	|■■■■■■■■■■■■■■■■■■■■■■■■■■
      30.930 [1]	|■■
      36.178 [1]	|■■
      41.425 [3]	|■■■■■■■
      46.673 [1]	|■■
      51.920 [2]	|■■■■■
      57.168 [1]	|■■
      62.415 [3]	|■■■■■■■
    
    
    Latency distribution:
      10% in 10.4695 secs
      25% in 14.8245 secs
      50% in 20.0772 secs
      75% in 30.5207 secs
      90% in 50.7006 secs
      95% in 61.5010 secs
      0% in 0.0000 secs
    
    Details (average, fastest, slowest):
      DNS+dialup:	0.0424 secs, 9.9399 secs, 62.4155 secs
      DNS-lookup:	0.0385 secs, 0.0000 secs, 0.3855 secs
      req write:	0.0000 secs, 0.0000 secs, 0.0004 secs
      resp wait:	23.8850 secs, 9.9089 secs, 62.3562 secs
      resp read:	0.0471 secs, 0.0166 secs, 0.1834 secs
    
    Status code distribution:
      [200]	50 responses

    The output shows that all 50 requests are successfully processed.

  2. Run the following command to query the pods:

    watch -n 1 'kubectl get po'

    image.png

    The output shows that five pods are created for the Stable Diffusion Service. This is because containerConcurrency: 1 is configured for the Service, which indicates that a pod can concurrently process at most 1 request.

Step 4: View the monitoring data of the Stable Diffusion Service

Knative provides out-of-the-box observability features. You can view the monitoring data of the Stable Diffusion Service on the Monitoring Dashboards of the Knative page. For more information about how to enable Knative dashboards, see View the Knative dashboard in Managed Service for Prometheus.

  • In the Overview (average over the selected time range) section, you can view the number of Knative requests (Request Volume), request success rate (Success Rate), client errors (4xx), server errors (5xx), and pod scaling trend.

    image.png

  • In the Response Time section, you can view the response latency data of Knative, including the P50, P90, P95, and P99 response latency.

    image.png