This article describes how to deploy the open source Kohya_ss and train a Low-Rank Adaptation (LoRA) model by using the Kohya_ss in the Elastic Algorithm Service (EAS) of Platform for AI (PAI). In AI painting scenarios, you can apply the trained LoRA model in the Stable Diffusion (SD) service as an auxiliary model to improve the SD painting performance.
• EAS is activated and the default workspace is created. For more information, see Activate PAI and create the default workspace.
• If you use a RAM user to deploy the model, make sure that the RAM user is granted the management permissions on EAS. For more information, see Grant the permissions that are required to use EAS.
• An Object Storage Service (OSS) bucket is created in the region where the PAI workspace resides. The OSS bucket is used to store training files, output model files, and logs. For more information about how to upload objects, see Upload objects.
1. Log on to the OSS console. Go to the path of the bucket that you created for the training. The bucket resides in the same region as the PAI workspace. Example: oss://kohya-demo/kohya/
.
2. Create a project folder in the bucket path. Example: KaraDetroit_loar
. Create the following folders under this project folder: Image
, Log
, and Model
. If you have a JSON configuration file, you can also upload it to this project folder.
3. Package the images that you want to use in the training into a folder and upload the folder to the Image
folder. The sample folder 100_pic.tgz is used in this example.
Note
.png
, .jpg
, .jpeg
, .webp
, or .bmp
..txt
format. The description information must be in the first line of the file. Separate multiple descriptions with commas (,).Parameter | Description |
Number | The number of times that each image is trained for. The value must be greater than or equal to 100. The total number of training times must be greater than 1500. • Assume that the folder contains 10 images, then each picture is to be trained 1500/10 = 150 times. The value of the number is 150.• Assume that the folder contains 20 images, then each picture is to be trained 1500/20 = 75 times. 75 is less than 100, so the value of the number is 100. |
Underscore | Use "_". This field is required. |
Name | A string that meets the requirements of file names in OSS. The name "pic" is used in this example. |
1. Go to the EAS page.
a) Log on to the PAI console.
b) In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace to which the model service that you want to manage belongs.
c) In the left-side navigation pane, choose Model Deployment>Elastic Algorithm Service (EAS) to go to the EAS-Online Model Services page.
2. On the EAS-Online Model Services page, click Deploy Service.
3. On the Deploy Service page, configure the parameters by using a form or a JSON script.
Parameter | Description |
Service Name | The name of the service. The name kohya_ss_demo is used in this example. |
Deployment Method | Select Deploy Web App by Using Image. |
Select Image | Click PAI Image. Select kohya_ss from the image drop-down list and 1.2 from the Image Version drop-down list. Note: You can select the latest version for the image when you deploy the model service. |
Model Settings | • Select Mount OSS Path. Mount OSS Path is used in this example. • Select an OSS path in the same region as the workspace. The path oss://kohya-demo/kohya/ is used in this example.• You can use a custom mount path. The path /workspace is used in this example.Note Turn off the Enable Read-only Mode. Otherwise, the model file cannot be exported to OSS. |
Command to Run | After you select an image, the system automatically configures the command to run. Example: python -u kohya_gui.py --listen=0.0.0.0 --server_port=8000 –headless .• --listen : Associates the program to the specified on-premises IP address to receive and process external requests.• --server_port : the listening port number. |
Parameter | Description |
Resource Group Type | Select Public Resource Group. |
Resource Configuration Mode | Select General. |
Resource Configuration | Select an Instance Type on the GPU tab. In terms of cost-effectiveness, we recommend that you use the ml.gu7i.c16m60.1-gu30 instance type. In this example, the instance type ml.gu7i.c8m30.1-gu30 is used. |
Note:
Replace the value of the name in line 4 and the value of the oss in line 18 based on the actual situation.
{
"metadata":
{
"name": "kohya_ss_demo",
"instance": 1,
"enable_webservice": true
},
"cloud":
{
"computing":
{
"instance_type": "ecs.gn6e-c12g1.12xlarge",
"instances": null
}
},
"storage": [
{
"oss":
{
"path": "oss://kohya-demo/kohya/",
"readOnly": false
},
"properties":
{
"resource_type": "model"
},
"mount_path": "/workspace"
}],
"containers": [
{
"image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/kohya_ss:1.2",
"script": "python -u kohya_gui.py --listen=0.0.0.0 --server_port=8000 --headless",
"port": 8000
}]
}
4. Click Deploy. The model deployment takes a few minutes to complete. If the Service Status turns to Running, the service is deployed.
1. Click the View Web App in the Service Type column of the service that you want to view.
2. Click Dreambooth LoRA.
3. Click Configuration file to specify the configuration file path. Skip this step if no SS_config.json
is available.
Note:
The path of the configuration file consists of the Mount Path that you specify in the **Configure Model Service Information step&&, the path of the folder that you create in OSS, and the SS_config.json. Example: /workspace/KaraDetroit_loar/SS_config.json
.
4. Configure parameters on the Source model tab. In this example, the Save trained model as parameter is set to safetensors, which ensures the security more than checkpoint.
5. Configure parameters on the Folders tab. Use the paths of the Image
, Log
, and Model
folders that you create in OSS and also the name of the output file.
Parameter | Description |
Image folder | The folder path of the images that you want to use for the training. The path consists of the Mount Path that you specify in the Configure Model Service Information step and the path of the Image folder that you create in OSS. Example: /workspace/KaraDetroit_loar/Image . |
Logging folder | The folder path of the output logs. The path consists of the Mount Path that you specify in the Configure Model Service Information step and the path of the Log folder that you create in OSS. Example: /workspace/KaraDetroit_loar/Log . |
Output folder | The folder path of the output model. The path consists of the Mount Path that you specify in the Configure Model Service Information step and the path of the Model folder that you create in OSS. Example: /workspace/KaraDetroit_loar/Model . |
Model output name | The name of the output model. Example: my_model . |
6. Configure parameters on the Training parameters tab. The following example uses the content of the SS_config.json
file in the Preparation step.
Parameter | Description |
LoRA Type | LoRA type: • LyCORIS/LoCon : You can adjust each layer of the LoRA model, such as: Res , Block , and Transformer .• LyCORIS/LoHa : The model can process more information with the same memory. |
LoRA network weights | Optional. The weight of the LoRA network. If you want to resume training based on previous training results, select the LoRA that was trained last. |
Train batch size | The size of the training batch. A larger value requires higher GPU performance. |
Epoch | The number of training epochs. All data is trained once in one epoch. Set the parameter based on your business requirements. In most cases: • Total number of trainings in Kohya = Number of images used for training x Number of repetitions x Number of training epochs / Number of training batches • Total number of trainings in web UI = Number of images used for training × Number of repetitions If you use images under the same directory, the total number of training times is multiplied by 2, and the number of times that the model is saved is halved in Kohya. |
Save every N epochs | The training results are saved every N training epochs. If you set the value to 2, the training results are saved every 2 epochs of training. |
Caption Extension | Optional. The name extension of the caption file. Example:.txt. |
Mixed precision | The precision used for mixed-precision training. Specify the parameter based on the GPU performance. Valid values: no, fp16, and bf16. We recommend that you set the value to bf16 if the memory of the GPU you use is greater than 30 GB. |
Save precision | The precision at which the model is saved. We recommend that you set the value to bf16 if the memory of the GPU you use is greater than 30 GB. |
Number of CPU threads per core | The number of threads per vCPU. Specify the parameter based on your business requirements. |
Learning rate | The learning rate. Default value: 0.0001. |
LR Scheduler | The learning rate scheduler. Specify the parameter based on your business requirements. |
LR Warmup (% of steps) | The warm-up steps of the learning rate. Specify the parameter based on your business requirements. The default value is 10. You can set the value to 0 if no warm-up is required. |
Optimizer | The optimizer. Specify the parameter based on your business requirements. The default value is AdamW8bit . The value of DAdaptation indicates that automatic optimization is enabled. |
Max Resolution | The maximum resolution. Specify the parameter based on your business requirements. |
Network Rank (Dimension) | The complexity of the model. We recommend that you set the value to 128. |
Network Alpha | The value is smaller or the same as the value of the Network Rank (Dimension) parameter in most cases. We recommend that you set Network Rank (Dimension) to 128 and Network Alpha to 64. |
Convolution Rank (Dimension) & Convolution Alpha | The convolution, which indicates the degree to which the model is fine-tuned by LoRA. Specify the parameter based on the LoRA type. Based on the official guide of Kohya: • If the LoRA type is LyCORIS/LoCon , set Convolution Rank (Dimension) to a value smaller than or equal to 64, and Convolution Alpha to 1. You can set Convolution Alpha to a lower value based on your business requirements.• If the LoRA type is LyCORIS/LoHa , set Convolution Rank (Dimension) to a value smaller than or equal to 32, and Convolution Alpha to 1. |
clip skip | The number of times the CLIP model is used. The valid values range from 1 to 12. A smaller value indicates that the generated image is closer to the original image or input image. • Realism: Set to 1. • Nijigen: Set to 2. |
Sample every n epoch | The sample is saved every N epochs. |
Sample Prompts | The sample prompt. Valid parameters: • --n : the prompts, or negative prompts.• --w : the width of the image.• --h : the height of the image.• --d : the seed of the image.• --l : the Classifier Free Guidance (CFG) scale, which indicates the relevance of the image generation to the prompt.• --s : the number of iteration steps. |
7. Click Train model to start training.
8. On the EAS-Online Model Services page, click the name of the service that you want to view. Click Service Logs to view the training progress in real time.
When the model saved
appears in the log, the training is completed.
9. After the training is completed, obtain the LoRA model file at the directory of the Model
folder that you specify. Example: my_model.safetensors
.
After you have the LoRA model trained, you can upload the model to the directory of the web UI of the Stable Diffusion. This way, you can use the LoRA model that you trained to generate images. For more information about how to deploy the Stable Diffusion service, see Deploy Stable Diffusion for AI image generation with EAS in a few clicks.
The following section describes how to upload a LoRA model to the Stable Diffusion web UI.
1. Configure the Model Service Information for the web UI of Stable Diffusion as the following image shows.
Note
You must select a -cluster
version for the image stable-diffusion-webui
. After the service is started, the /data-{User_ID}/models/Lora
path is automatically created in the mounted OSS path.
2. Add the following parameters to Command to Run:
--lora-dir
: optional.
--lora-dir
parameter, the model files of users are isolated. Only the model files in the directory {OSS path}/data-{User_ID}/models/Lora
are loaded.--lora-dir
parameter, the files in the specified directory and the directory {OSS path}/data-{current logon user ID}/models/Lora
are loaded. Example: --lora-dir /code/stable-diffusion-webui/data-oss/models/Lora
.--data-dir {OSS mount path}
. Example: --data-dir /code/stable-diffusion-webui/data-oss
.3. Upload the LoRA model file to the directory {OSS path}/data-{User_ID}/models/Lora
. Example: oss://bucket-test/data-oss/data-1596******100/models/Lora
.
Note
The path /data-{Current logon user ID}/models/Lora
is automatically created in OSS after the service is started. Therefore, you must upload the LoRA model file after the service is started.
You can obtain the {User_ID}
in the upper-right corner of the page.
1. Configure the Model Service Information for the web UI of Stable Diffusion as the following image shows.
Note
You must select a non -cluster
version for the image stable-diffusion-webui
. After the service is started, the /models/Lora
path is automatically created in the mounted OSS path.
2. Add the --data-dir {OSS mount path}
parameter to the Command to Run. Example: --data-dir /code/stable-diffusion-webui/data-oss
.
3. Upload the LoRA model file to {OSS path}/models/Lora
. Example: oss://bucket-test/data-oss/models/Lora
.
Note
The /models/Lora
path is automatically created in the mounted OSS bucket after the service is started. You do not need to create one. Therefore, you must upload the LoRA model file after the service is started.
Deploy Stable Diffusion for AI Painting with EAS in a Few Clicks
35 posts | 1 followers
FollowAlibaba Cloud Data Intelligence - December 5, 2023
Alibaba Cloud Data Intelligence - December 5, 2023
Alibaba Cloud Data Intelligence - October 16, 2023
Farruh - October 2, 2023
Alibaba Cloud Data Intelligence - June 18, 2024
Regional Content Hub - March 8, 2024
35 posts | 1 followers
FollowA platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreOffline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreAccelerate innovation with generative AI to create new business success
Learn MoreAccelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreMore Posts by Alibaba Cloud Data Intelligence