EASを使用してLLMを展開する - Platform For AI - Alibaba Cloud ドキュメントセンター

Platform for AI (PAI) のElastic Algorithm Service (EAS) は、パラメータを設定してオープンソースのラージランゲージモデル (LLM) をデプロイできるシナリオベースのデプロイモードを提供します。このトピックでは、EASを使用してLLMサービスを展開および呼び出す方法について説明します。

機能の概要

ChatGPTやTongYi Qianwen (Qwen) モデルシリーズなどのLLMのアプリケーションは、特に推論タスクで大きな注目を集めました。 EASを使用すると、LLMを便利で効率的な方法で展開でき、次の展開オプションをサポートします。

オープンソースモデルの迅速な展開: EASを使用すると、DeepSeek-R1、DeepSeek-V3、QVQ-72B-Preview、QwQ-32B-Preview、ラマ、Qwen、Marco、internlm3、Qwen2-VL、AlphaFold2などのさまざまなオープンソースLLMを展開できます。次の展開モードがサポートされています: 標準展開、BladeLLMベースの高速展開、およびvLLMベースの高速展開。
ハイパフォーマンス展開: PAIを使用して開発されたBladeLLMエンジンは、低レイテンシと高スループットでLLM推論を実装するための効率的な展開に使用されます。高パフォーマンス展開は、オープンソースのパブリックモデルとカスタムモデルの展開をサポートします。カスタムモデルを配置するには、この配置オプションを選択します。

次の表に、2つの展開オプションの違いを示します。

タイプ	オープンソースモデルの迅速な展開	ハイパフォーマンス展開
モデル構成	オープンソースのパブリックモデル	オープンソースのパブリックモデルカスタムモデル
高速化されたフレームワーク	迅速な展開: BladeLLM 迅速なデプロイ: vLLM 標準デプロイ (アクセラレーションなし)	迅速な展開: BladeLLM
呼び出し方法	標準デプロイ: API呼び出しとWebUI呼び出しデプロイの高速化: API呼び出し	API呼び出し

このトピックでは、オープンソースモデルのクイックデプロイを例として、LLMサービスのデプロイ方法について説明します。ハイパフォーマンスデプロイの実行方法については、「BladeLLMの使用開始方法」をご参照ください。

EASサービスの展開

PAI コンソールにログインします。リージョンとワークスペースを選択します。次に、[Elastic Algorithm Service (EAS) の入力] をクリックします。
[Elastic Algorithm Service (EAS)] ページで、[サービスのデプロイ] をクリックします。 [サービスのデプロイ] ページの [シナリオベースのモデルのデプロイ] セクションで、[LLMデプロイ] を選択します。

[LLMデプロイ] ページで、次の表に示すパラメーターを設定します。

パラメーター		説明
基本情報	サービス名	モデルサービスの名称を指定します。
	バージョン	[オープンソースモデルのクイックデプロイ] を選択します。ハイパフォーマンスデプロイの実行方法については、「BladeLLMの使用開始方法」をご参照ください。
	モデルタイプ	モデルカテゴリを選択します。
	デプロイ方法	次の表に、さまざまなモデルカテゴリでサポートされているさまざまな展開方法を示します。迅速な展開: BladeLLM 迅速なデプロイ: vLLM 標準展開: 高速化フレームワークが関与しないサービスをデプロイするときに、特定のモデルカテゴリのデプロイ方法を表示できます。 AcceleratedデプロイメントはAPI推論のみをサポートします。
リソースの配置	リソースタイプ	デフォルトでは、[パブリックリソース] が選択されています。専用リソースを使用してサービスをデプロイする場合は、EASリソースグループまたはリソースクォータを使用できます。リソースグループを購入してリソースクォータを作成する方法の詳細については、「専用リソースグループの操作」および「Lingjunリソースクォータ」をご参照ください。説明リソースクォータは、中国 (Ulanqab) およびシンガポールリージョンでのみ使用できます。
リソースの配置	デプロイリソース	パブリックリソースを使用する場合、モデルカテゴリを選択すると、システムは自動的に適切なインスタンスタイプを選択します。

[デプロイ] をクリックします。

EASサービスを呼び出す

呼び出し方法は、展開モードによって異なります。デプロイメントオプションに基づいて適切な呼び出し方法を選択できます。

標準展開

WebUIを使用したEASサービスの呼び出し

目的のサービスを見つけて、[サービスタイプ] 列の [Webアプリの表示] をクリックします。
WebUIページで推論パフォーマンスをテストします。
ChatLLM-WebUIページのテキストボックスにダイアログコンテンツを入力します。たとえば、と入力できます。カナダの首都は何ですか? [送信] をクリックしてダイアログを開始します。

API操作を使用したEASサービスの呼び出し

サービスエンドポイントとトークンを取得します。
1. EASに移動し、ワークスペースを選択し、EASにアクセスします。
2. 目的のサービスの名前をクリックして、詳細ページを表示します。
3. [基本情報] セクションで、[通話情報の表示] をクリックします。 [パブリックエンドポイントの呼び出し] タブで、サービストークンとエンドポイントを取得します。

API操作を呼び出して推論を実行するには、次のいずれかの方法を使用します。

HTTPの使用

非ストリーミングモード

クライアントは、curlコマンドが実行されると、次のタイプの標準HTTPリクエストを送信します。

ストリング要求
```
curl $host -H 'Authorization: $authorization' --data-binary @chatllm_data.txt -v
```
$authorizationをサービストークンに置き換えます。 $hostをサービスエンドポイントに置き換えます。 chatllm_data.txtファイルは、プロンプトを含むプレーンテキストファイルです。たとえば、カナダの首都は何ですか?

構造化リクエスト

curl $host -H 'Authorization: $authorization' -H "Content-type: application/json" --data-binary @chatllm_data.json -v -H "Connection: close"

chatllm_data.jsonファイルを使用して推論パラメーターを設定します。次のサンプルコードは、chatllm_data.jsonファイルの形式例を示しています。

{
  "max_new_tokens": 4096,
  "use_stream_chat": false,
  "prompt": "What is the capital of Canada?",
  "system_prompt": "Act like you are a knowledgeable assistant who can provide information on geography and related topics.",
  "history": [
    [
      "Can you tell me what's the capital of France?",
      "The capital of France is Paris."
    ]
  ],
  "temperature": 0.8,
  "top_k": 10,
  "top_p": 0.8,
  "do_sample": true,
  "use_cache": true
}

次の表に、上記のコードのパラメーターを示します。ビジネス要件に基づいてパラメーターを設定します。

パラメーター	説明	デフォルト値
max_new_tokens	出力トークンの最大数。	2048
use_stream_chat	ストリーミングモードで出力トークンを返すかどうかを指定します。	true
プロンプト	ユーザープロンプト。	""
system_prompt	システムプロンプト。	""
history	対話の歴史。値はList[Tuple(str, str)] 形式です。	[()]
temperature	モデル出力のランダム性。より大きな値は、より高いランダム性を指定する。値0は、固定出力を指定する。値はFloat型で、0から1の範囲です。	0.95
top_k	生成された結果から選択された出力の数。	30
top_p	生成された結果から選択された出力の確率しきい値。値はFloat型で、0から1の範囲です。	0.8
do_sample	出力サンプリングを有効にするかどうかを指定します。	true
use_cache	KVキャッシュを有効にするかどうかを指定します。	true

Pythonリクエストパッケージに基づいてクライアントを実装することもできます。 -- promptパラメーターを使用して、python xxx.py -- prompt "What is the capital of Canada? などのリクエストコンテンツを指定できます。

import argparse
import json
from typing import Iterable, List

import requests

def post_http_request(prompt: str,
                      system_prompt: str,
                      history: list,
                      host: str,
                      authorization: str,
                      max_new_tokens: int = 2048,
                      temperature: float = 0.95,
                      top_k: int = 1,
                      top_p: float = 0.8,
                      langchain: bool = False,
                      use_stream_chat: bool = False) -> requests.Response:
    headers = {
        "User-Agent": "Test Client",
        "Authorization": f"{authorization}"
    }
    if not history:
        history = [
            (
                "San Francisco is a",
                "city located in the state of California in the United States. \
                It is known for its iconic landmarks, such as the Golden Gate Bridge \
                and Alcatraz Island, as well as its vibrant culture, diverse population, \
                and tech industry. The city is also home to many famous companies and \
                startups, including Google, Apple, and Twitter."
            )
        ]
    pload = {
        "prompt": prompt,
        "system_prompt": system_prompt,
        "top_k": top_k,
        "top_p": top_p,
        "temperature": temperature,
        "max_new_tokens": max_new_tokens,
        "use_stream_chat": use_stream_chat,
        "history": history
    }
    if langchain:
        pload["langchain"] = langchain
    response = requests.post(host, headers=headers,
                             json=pload, stream=use_stream_chat)
    return response

def get_response(response: requests.Response) -> List[str]:
    data = json.loads(response.content)
    output = data["response"]
    history = data["history"]
    return output, history

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--top-k", type=int, default=4)
    parser.add_argument("--top-p", type=float, default=0.8)
    parser.add_argument("--max-new-tokens", type=int, default=2048)
    parser.add_argument("--temperature", type=float, default=0.95)
    parser.add_argument("--prompt", type=str, default="How can I get there?")
    parser.add_argument("--langchain", action="store_true")

    args = parser.parse_args()

    prompt = args.prompt
    top_k = args.top_k
    top_p = args.top_p
    use_stream_chat = False
    temperature = args.temperature
    langchain = args.langchain
    max_new_tokens = args.max_new_tokens

    host = "<Public endpoint of the EAS service>"
    authorization = "<Public token of the EAS service>"

    print(f"Prompt: {prompt!r}\n", flush=True)
    # System prompts can be included in the requests. 
    system_prompt = "Act like you are programmer with \
                5+ years of experience."

    # Dialogue history can be included in the client request. The client manages the dialogue history to implement multi-round dialogues. In most cases, information from the previous round of dialogue is used. The information is in the List[Tuple(str, str)] format. 
    history = []
    response = post_http_request(
        prompt, system_prompt, history,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p,
        langchain=langchain, use_stream_chat=use_stream_chat)
    output, history = get_response(response)
    print(f" --- output: {output} \n --- history: {history}", flush=True)

# The server returns a JSON response that includes the inference result and dialogue history. 
def get_response(response: requests.Response) -> List[str]:
    data = json.loads(response.content)
    output = data["response"]
    history = data["history"]
    return output, history

次のパラメータに注意してください。

hostパラメーターをサービスエンドポイントに設定します。
authorizationパラメーターをサービストークンに設定します。

ストリーミングモード

ストリーミングモードでは、HTTP SSEメソッドが使用されます。 -- promptパラメーターを使用して、python xxx.py -- prompt "What is the capital of Canada? などのリクエストコンテンツを指定できます。

import argparse
import json
from typing import Iterable, List

import requests


def clear_line(n: int = 1) -> None:
    LINE_UP = '\033[1A'
    LINE_CLEAR = '\x1b[2K'
    for _ in range(n):
        print(LINE_UP, end=LINE_CLEAR, flush=True)


def post_http_request(prompt: str,
                      system_prompt: str,
                      history: list,
                      host: str,
                      authorization: str,
                      max_new_tokens: int = 2048,
                      temperature: float = 0.95,
                      top_k: int = 1,
                      top_p: float = 0.8,
                      langchain: bool = False,
                      use_stream_chat: bool = False) -> requests.Response:
    headers = {
        "User-Agent": "Test Client",
        "Authorization": f"{authorization}"
    }
    if not history:
        history = [
            (
                "San Francisco is a",
                "city located in the state of California in the United States. \
                It is known for its iconic landmarks, such as the Golden Gate Bridge \
                and Alcatraz Island, as well as its vibrant culture, diverse population, \
                and tech industry. The city is also home to many famous companies and \
                startups, including Google, Apple, and Twitter."
            )
        ]
    pload = {
        "prompt": prompt,
        "system_prompt": system_prompt,
        "top_k": top_k,
        "top_p": top_p,
        "temperature": temperature,
        "max_new_tokens": max_new_tokens,
        "use_stream_chat": use_stream_chat,
        "history": history
    }
    if langchain:
        pload["langchain"] = langchain
    response = requests.post(host, headers=headers,
                             json=pload, stream=use_stream_chat)
    return response


def get_streaming_response(response: requests.Response) -> Iterable[List[str]]:
    for chunk in response.iter_lines(chunk_size=8192,
                                     decode_unicode=False,
                                     delimiter=b"\0"):
        if chunk:
            data = json.loads(chunk.decode("utf-8"))
            output = data["response"]
            history = data["history"]
            yield output, history


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--top-k", type=int, default=4)
    parser.add_argument("--top-p", type=float, default=0.8)
    parser.add_argument("--max-new-tokens", type=int, default=2048)
    parser.add_argument("--temperature", type=float, default=0.95)
    parser.add_argument("--prompt", type=str, default="How can I get there?")
    parser.add_argument("--langchain", action="store_true")
    args = parser.parse_args()

    prompt = args.prompt
    top_k = args.top_k
    top_p = args.top_p
    use_stream_chat = True
    temperature = args.temperature
    langchain = args.langchain
    max_new_tokens = args.max_new_tokens

    host = ""
    authorization = ""

    print(f"Prompt: {prompt!r}\n", flush=True)
    system_prompt = "Act like you are programmer with \
                5+ years of experience."
    history = []
    response = post_http_request(
        prompt, system_prompt, history,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p,
        langchain=langchain, use_stream_chat=use_stream_chat)

    for h, history in get_streaming_response(response):
        print(
            f" --- stream line: {h} \n --- history: {history}", flush=True)

次のパラメータに注意してください。

hostパラメーターをサービスエンドポイントに設定します。
authorizationパラメーターをサービストークンに設定します。

WebSocketの使用

WebSocketプロトコルは、ダイアログ履歴を効率的に処理できます。 WebSocketメソッドを使用してサービスに接続し、1回以上のダイアログを実行できます。サンプルコード：

import os
import time
import json
import struct
from multiprocessing import Process

import websocket

round = 5
questions = 0


def on_message_1(ws, message):
    if message == "<EOS>":
        print('pid-{} timestamp-({}) receives end message: {}'.format(os.getpid(),
              time.time(), message), flush=True)
        ws.send(struct.pack('!H', 1000), websocket.ABNF.OPCODE_CLOSE)
    else:
        print("{}".format(time.time()))
        print('pid-{} timestamp-({}) --- message received: {}'.format(os.getpid(),
              time.time(), message), flush=True)


def on_message_2(ws, message):
    global questions
    print('pid-{} --- message received: {}'.format(os.getpid(), message))
    # end the client-side streaming
    if message == "<EOS>":
        questions = questions + 1
        if questions == 5:
            ws.send(struct.pack('!H', 1000), websocket.ABNF.OPCODE_CLOSE)


def on_message_3(ws, message):
    print('pid-{} --- message received: {}'.format(os.getpid(), message))
    # end the client-side streaming
    ws.send(struct.pack('!H', 1000), websocket.ABNF.OPCODE_CLOSE)


def on_error(ws, error):
    print('error happened: ', str(error))


def on_close(ws, a, b):
    print("### closed ###", a, b)


def on_pong(ws, pong):
    print('pong:', pong)

# stream chat validation test
def on_open_1(ws):
    print('Opening Websocket connection to the server ... ')
    params_dict = {}
    params_dict['prompt'] = """Show me a golang code example: """
    params_dict['temperature'] = 0.9
    params_dict['top_p'] = 0.1
    params_dict['top_k'] = 30
    params_dict['max_new_tokens'] = 2048
    params_dict['do_sample'] = True
    raw_req = json.dumps(params_dict, ensure_ascii=False).encode('utf8')
    # raw_req = f"""To open a Websocket connection to the server: """

    ws.send(raw_req)
    # end the client-side streaming


# multi-round query validation test
def on_open_2(ws):
    global round
    print('Opening Websocket connection to the server ... ')
    params_dict = {"max_new_tokens": 6144}
    params_dict['temperature'] = 0.9
    params_dict['top_p'] = 0.1
    params_dict['top_k'] = 30
    params_dict['use_stream_chat'] = True
    params_dict['prompt'] = "Hello!"
    params_dict = {
        "system_prompt":
        "Act like you are programmer with 5+ years of experience."
    }
    raw_req = json.dumps(params_dict, ensure_ascii=False).encode('utf8')
    ws.send(raw_req)
    params_dict['prompt'] = "Please write a sorting algorithm in Python."
    raw_req = json.dumps(params_dict, ensure_ascii=False).encode('utf8')
    ws.send(raw_req)
    params_dict['prompt'] = "Please convert the programming language to Java."
    raw_req = json.dumps(params_dict, ensure_ascii=False).encode('utf8')
    ws.send(raw_req)
    params_dict['prompt'] = "Please introduce yourself."
    raw_req = json.dumps(params_dict, ensure_ascii=False).encode('utf8')
    ws.send(raw_req)
    params_dict['prompt'] = "Please summarize the dialogue above."
    raw_req = json.dumps(params_dict, ensure_ascii=False).encode('utf8')
    ws.send(raw_req)


# Langchain validation test.
def on_open_3(ws):
    global round
    print('Opening Websocket connection to the server ... ')

    params_dict = {}
    # params_dict['prompt'] = """To open a Websocket connection to the server: """
    params_dict['prompt'] = """Can you tell me what's the MNN?"""
    params_dict['temperature'] = 0.9
    params_dict['top_p'] = 0.1
    params_dict['top_k'] = 30
    params_dict['max_new_tokens'] = 2048
    params_dict['use_stream_chat'] = False
    params_dict['langchain'] = True
    raw_req = json.dumps(params_dict, ensure_ascii=False).encode('utf8')
    ws.send(raw_req)


authorization = ""
host = "ws://" + ""


def single_call(on_open_func, on_message_func, on_clonse_func=on_close):
    ws = websocket.WebSocketApp(
        host,
        on_open=on_open_func,
        on_message=on_message_func,
        on_error=on_error,
        on_pong=on_pong,
        on_close=on_clonse_func,
        header=[
            'Authorization: ' + authorization],
    )

    # setup ping interval to keep long connection.
    ws.run_forever(ping_interval=2)


if __name__ == "__main__":
    for i in range(5):
        p1 = Process(target=single_call, args=(on_open_1, on_message_1))
        p2 = Process(target=single_call, args=(on_open_2, on_message_2))
        p3 = Process(target=single_call, args=(on_open_3, on_message_3))

        p1.start()
        p2.start()
        p3.start()

        p1.join()
        p2.join()
        p3.join()

次のパラメータに注意してください。

authorizationパラメーターをサービストークンに設定します。
hostパラメーターをサービスエンドポイントに設定します。エンドポイントのhttpプレフィックスをwsに置き換えます。
use_stream_chatパラメーターを使用して、クライアントがストリーミングモードで出力を生成するかどうかを指定します。デフォルト値は True です。
マルチラウンドダイアログを実装するには、上記のコードのon_open_2関数を参照してください。

BladeLLMベースの高速デプロイ

BladeLLMベースの高速展開では、API操作を呼び出すことによってのみサービスを呼び出すことができます。サービスを呼び出すには、次の手順を実行します。

サービスアクセスアドレスとトークンを表示するには:
1. [モデルオンラインサービス (EAS)] ページで、目的のサービスの [サービス方法] 列をクリックし、[通話情報] を選択します。
2. [呼び出し情報] ダイアログボックスで、サービスアクセスアドレスとトークンをメモします。

ターミナルで次のコードを実行して、サービスを呼び出し、生成されたテキストを受信します。

# Call EAS service
curl -X POST \
    -H "Content-Type: application/json" \
    -H "Authorization: AUTH_TOKEN_FOR_EAS" \
    -d '{"prompt":"What is the capital of Canada?", "stream":"true"}' \
    <service_url>/v1/completions

次のパラメータに注意してください。

権限付与: 前の手順で取得したサービストークンに設定します。
<service_url>: これを前の手順で取得したサービスアクセスアドレスに置き換えます。

次のコマンド出力を受け取る必要があります。

data: {"id":"91f9a28a-f949-40fb-b720-08ceeeb2****","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" The"}],"object":"text_completion","usage":{"prompt_tokens":7,"completion_tokens":1,"total_tokens":8},"error_info":null}

data: {"id":"91f9a28a-f949-40fb-b720-08ceeeb2****","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" capital"}],"object":"text_completion","usage":{"prompt_tokens":7,"completion_tokens":2,"total_tokens":9},"error_info":null}

data: {"id":"91f9a28a-f949-40fb-b720-08ceeeb2****","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" of"}],"object":"text_completion","usage":{"prompt_tokens":7,"completion_tokens":3,"total_tokens":10},"error_info":null}

data: {"id":"91f9a28a-f949-40fb-b720-08ceeeb2****","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" Canada"}],"object":"text_completion","usage":{"prompt_tokens":7,"completion_tokens":4,"total_tokens":11},"error_info":null}

data: {"id":"91f9a28a-f949-40fb-b720-08ceeeb2****","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" is"}],"object":"text_completion","usage":{"prompt_tokens":7,"completion_tokens":5,"total_tokens":12},"error_info":null}

data: {"id":"91f9a28a-f949-40fb-b720-08ceeeb2****","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" Ottawa"}],"object":"text_completion","usage":{"prompt_tokens":7,"completion_tokens":6,"total_tokens":13},"error_info":null}

data: {"id":"91f9a28a-f949-40fb-b720-08ceeeb2****","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"."}],"object":"text_completion","usage":{"prompt_tokens":7,"completion_tokens":7,"total_tokens":14},"error_info":null}

data: {"id":"91f9a28a-f949-40fb-b720-08ceeeb2****","choices":[{"finish_reason":"stop","index":0,"logprobs":null,"text":""}],"object":"text_completion","usage":{"prompt_tokens":7,"completion_tokens":8,"total_tokens":15},"error_info":null}

data: [DONE]

vLLMベースの高速デプロイ

vLLMベースの高速展開では、API操作を呼び出すことによってのみサービスを呼び出すことができます。サービスを呼び出すには、次の手順を実行します。

サービスアクセスアドレスとトークンを表示するには:
1. [モデルオンラインサービス (EAS)] ページで、目的のサービスの [サービス方法] 列をクリックし、[通話情報] を選択します。
2. [呼び出し情報] ダイアログボックスで、サービスアクセスアドレスとトークンをメモします。

ターミナルで、次のコードを実行してサービスを呼び出します。

Python

from openai import OpenAI

##### API configuration #####
openai_api_key = "<EAS API KEY>"
openai_api_base = "<EAS API Endpoint>/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id
print(model)


def main():

    stream = True

    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "What is the capital of Canada?",
                    }
                ],
            }
        ],
        model=model,
        max_completion_tokens=2048,
        stream=stream,
    )

    if stream:
        for chunk in chat_completion:
            print(chunk.choices[0].delta.content, end="")
    else:
        result = chat_completion.choices[0].message.content
        print(result)


if __name__ == "__main__":
    main()

次のパラメータに注意してください。

<EAS API KEY>: このパラメーターを取得したサービストークンに設定します。
<EAS API Endpoint>: このパラメーターを取得したサービスエンドポイントに設定します。

CLI

curl -X POST <service_url>/v1/chat/completions -d '{
    "model": "Qwen2.5-7B-Instruct",
    "messages": [
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are a helpful and harmless assistant."
                }
            ]
        },
        {
            "role": "user",
            "content": "What is the capital of Canada?"
        }
    ]
}' -H "Content-Type: application/json" -H "Authorization: <your-token>"

次のパラメータに注意してください。

<service_url>: このパラメーターを取得したサービスエンドポイントに設定します。
<your-token>: このパラメーターを取得したサービストークンに設定します。

Platform For AI:LLMをサービスとしてデプロイする

機能の概要

EASサービスの展開

EASサービスを呼び出す

標準展開

WebUIを使用したEASサービスの呼び出し

API操作を使用したEASサービスの呼び出し

HTTPの使用

WebSocketの使用

BladeLLMベースの高速デプロイ

vLLMベースの高速デプロイ

Python

CLI

関連ドキュメント