Overview
Alibaba Cloud Model Studio allows you to build applications for complex tasks in various scenarios. Applications can use Assistant APIs to access large language models (LLMs) and implement features such as Knowledge Retrieval Augmentation and plug-ins. The Model Studio provides an integrated, intuitive user interface and a wide range of configuration options, including LLM selection, prompt engineering, Knowledge Retrieval Augmentation, and plug-ins.
To ensure the stability and functionality of your applications, you can test and debug the applications in a built-in test environment before deployment. This improves the overall quality and user experience of your project.
Procedure
You can select one of the following paths to create an application:
Homepage > Application > Create Now
Application Center > My Applications > Create Application
After you create an application, perform the following steps to configure the application.
1. Select a model
On the Create Application page, select a model for the application. Based on the input question, the model determines whether to use Knowledge Retrieval Augmentation or plug-ins, or generate a response based on the world knowledge of the model.
You can configure the following parameters:
Temperature: the diversity of the generated content.
Maximum Reply Length: the maximum length of the generated content, excluding the prompt. The maximum content length varies based on the model.
Number of Rounds with Context: the maximum number of rounds of conversation history that the model remembers. A larger number of rounds indicates more relevant conversations.
Only Qwen-Turbo, Qwen-Plus, and Qwen-Max are supported.
2. Enter a prompt
Enter a prompt for the application. You can describe the role, the functions, or the capabilities of the model in the prompt. When you call the application, the prompt is passed to the model as the system prompt.
If you enable the Knowledge Retrieval Augmentation feature, the system automatically adds the corresponding prompt to the prompt field. You can adjust the prompt as a whole to manage the generated content.
You can also use the prompt optimization feature. Enter your requirements in the prompt field and click Prompt Optimization. The system generates an optimized prompt that you can modify or directly use.
3. Knowledge Retrieval Augmentation
The Knowledge Retrieval Augmentation feature is integrated into Application Center. After you enable Knowledge Retrieval Augmentation for an application, the system automatically adds the corresponding prompt to the prompt field.
You can select a knowledge base and specify the number of retrieved segments.
Select Knowledge Base: Select a knowledge base from which information can be retrieved.
Retrieved Segments: The maximum number of segments that can be retrieved.
4. Select plug-ins
Plug-ins can extend the functionality of LLMs to cover more business scenarios. For example, Python code interpreter allows the model to run Python codes and calculator improves the computing capability of the model.
You can use official and custom plug-ins based on your business requirements.
5. Advanced configuration
Rapid intervention
You can use the rapid intervention feature to adjust the output of your application. This feature processes user input that contains prohibited speech and risky content generated by the model based on rules. However, the rapid intervention feature cannot substitute professional products for content moderation.
Multi-round conversation
Multi-round conversation based on the built-in cache: Conservation records are stored in built-in cache.
Back up conversation records by using AnalyticDB for PostgreSQL: By default, the data in the built-in cache is not persistently stored in disks and is stored only in the memory for 1 hour. If you enable this feature and select a purchased AnalyticDB for PostgreSQL instance, the conversation records of this application are automatically stored in the AnalyticDB for PostgreSQL instance.
6. Application testing
You can test your application in one of the following versions:
Testing Version: When you configure the application, the system automatically saves your changes as a draft version.
Published Version: Click Publish in the upper-right corner of the page to publish the current version.
You can view the debug process in the generated response.
Fees are calculated based on the numbers of input and output tokens during testing. You are charged after the free resource plan expires or is exhausted.