This topic describes how to deploy open source Kohya_ss and train a Low-Rank Adaptation (LoRA) model by using Kohya_ss in the Elastic Algorithm Service (EAS) of Platform for AI (PAI). In AI painting scenarios, you can apply the trained LoRA model in the Stable Diffusion (SD) service as an auxiliary model to improve the SD painting performance.
Prerequisites
EAS is activated and the default workspace is created. For more information, see Activate PAI and create the default workspace.
If you use a RAM user to deploy the model, make sure that the RAM user is granted the management permissions on EAS. For more information, see Grant the permissions that are required to use EAS.
An Object Storage Service (OSS) bucket is created in the region where the PAI workspace resides. The OSS bucket is used to store training files, output model files, and logs. For information about how to upload objects, see Upload objects.
Preparations
Log on to the OSS console. Go to the path of the bucket that you created for the training. The bucket resides in the same region as the PAI workspace. Example:
oss://kohya-demo/kohya/
.Create a project folder in the bucket path. Example:
KaraDetroit_loar
. Create the following folders in this project folder:Image
,Log
, andModel
. If you have a JSON configuration file, you can also upload it to this project folder.Image: stores the source files used for the training.
Model: stores the output model file.
Log: stores the logs.
SS_config.json: a JSON file that is used to configure multiple parameters at the same time. The file is optional. You can modify related parameters in the JSON file, such as the folder path or output model name. For more information about the configuration, see GitHub. The sample file SS_config.json provides a reference.
Package the images that you want to use in the training into a compressed file and upload the compressed file to the
Image
folder. The sample file named 100_pic.tgz is used in this example.ImportantThe name of the packaged folder must be in the number_name format. Example: 100_pic.
The images must be in the following formats:
.png
,.jpg
,.jpeg
,.webp
, or.bmp
.Each image must have a description file with the same name. The description file can be in the
.txt
format. The description must be on the first line of the file. Separate multiple descriptions with commas (,).
Parameter
Description
Number
The number of training sessions for each image. The value must be greater than or equal to 100. The total number of training sessions must be greater than 1500.
For example, the folder contains 10 images, and each image must be trained
1500/10 = 150
times. The value of the Number parameter is 150.For example, the folder contains 20 images, and each image must be trained
1500/20 = 75
times. In this case, the value of the Number parameter is increased to 100 because the calculation value is 75, which is less than 100.
Underscore
Use "_". This field is required.
Name
A string that meets the requirements of file names in OSS. The name "pic" is used in this example.
Deploy a Kohya_ss service
Go to the EAS-Online Model Services page.
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace to which the model service that you want to manage belongs.
In the left-side navigation pane, choose Model Deployment>Elastic Algorithm Service (EAS) to go to the EAS-Online Model Services page.
On the PAI-EAS Model Online Service page, click Deploy Service. In the dialog box that appears, select Custom Deployment and click OK.
On the Create Service page, configure the parameters by using a form or a JSON script.
Configure parameters by using a form
Configure model service information
Parameter
Description
Service Name
The name of the service. The name kohya_ss_demo is used in this example.
Deployment Method
Select Deploy Web App by Using Image.
Select Image
Click PAI Image. Select kohya_ss from the image drop-down list and 1.2 from the Image Version drop-down list.
NoteYou can select the latest version for the image when you deploy the model service.
Model Settings
Select Mount OSS Path. Mount OSS Path is used in this example.
Select an OSS path in the same region as the workspace. The path
oss://kohya-demo/kohya/
is used in this example.You can use a custom mount path. The path
/workspace
is used in this example.ImportantTurn off Enable Read-only Mode. Otherwise, the model file cannot be exported to OSS.
Command to Run
After you select an image, the system automatically configures the command to run. Example:
python -u kohya_gui.py --listen=0.0.0.0 --server_port=8000 --headless
.--listen:
Associates the program to the specified on-premises IP address to receive and process external requests.--server_port:
the port number for listening.
Configure Resource Deployment Information.
Parameter
Description
Resource Group Type
Select Public Resource Group.
Resource Configuration Mode
Select General.
Resource Configuration
Select an Instance Type from the GPU tab. In terms of cost-effectiveness, we recommend that you use the ml.gu7i.c16m60.1-gu30 instance type. In this example, the ml.gu7i.c8m30.1-gu30 instance type is used.
You can configure other parameters based on your business requirements.
Configure parameters by using a JSON script
Configure the JSON script in the Configuration Editor.
Sample JSON file:
ImportantReplace the value of "name" in line 4 and the value of "oss" in line 18 with actual values.
{ "metadata": { "name": "kohya_ss_demo", "instance": 1, "enable_webservice": true }, "cloud": { "computing": { "instance_type": "ecs.gn6e-c12g1.12xlarge", "instances": null } }, "storage": [ { "oss": { "path": "oss://kohya-demo/kohya/", "readOnly": false }, "properties": { "resource_type": "model" }, "mount_path": "/workspace" }], "containers": [ { "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/kohya_ss:1.2", "script": "python -u kohya_gui.py --listen=0.0.0.0 --server_port=8000 --headless", "port": 8000 }] }
Click Deploy. The model deployment requires a few minutes to complete. If the Service Status is Running, the service is deployed.
Train a LoRA model
Click View Web App in the Service Type column of the service that you want to view.
Click Dreambooth LoRA.
Click Configuration file to specify the path of the configuration file. Skip this step if no
SS_config.json
file is available.NoteThe path of the configuration file consists of the Mount Path that you specified in the Configure Model Service Information step, the path of the folder that you created in OSS, and the SS_config.json file. Example:
/workspace/KaraDetroit_loar/SS_config.json
.Configure parameters on the Source model tab. In this example, Save trained model as parameter is set to safetensors, which ensures the security more than checkpoint.
Configure parameters on the Folders tab. Use the name of the output file and the paths of the
Image
,Log
, andModel
folders that you created in OSS. The following table describes the parameters.Parameter
Description
Image folder
The folder path of the images that you want to use for the training. The path consists of the Mount Path that you specified in the Configure Model Service Information step and the path of the
Image
folder that you created in OSS. Example:/workspace/KaraDetroit_loar/Image
.Logging folder
The folder path of the output logs. The path consists of the Mount Path that you specified in the Configure Model Service Information step and the path of the
Log
folder that you created in OSS. Example:/workspace/KaraDetroit_loar/Log
.Output folder
The folder path of the output model. The path consists of the Mount Path that you specified in the Configure Model Service Information step and the path of the
Model
folder that you created in OSS. Example:/workspace/KaraDetroit_loar/Model
.Model output name
The name of the output model. Example: my_model.
Configure parameters on the Training parameters tab. The following example uses the content of the
SS_config.json
file in the Preparation step.Parameter
Description
LoRA Type
LoRA type:
LyCORIS/LoCon:
You can adjust each layer of the LoRA model, such as:Res
,Block
, andTransformer
.LyCORIS/LoHa:
The model can process more information without the need to increase memory.
LoRA network weights
Optional. The weight of the LoRA network. If you want to resume training based on previous training results, select the most recently trained LoRA. Optional.
Train batch size
The size of the training batch. A larger value requires higher GPU performance.
Epoch
The number of training epochs. All data is trained once in one epoch. Configure the parameter based on your business requirements. In most cases:
Total number of training sessions in Kohya = Number of images used for training x Number of repetitions x Number of training epochs / Number of training batches
Total number of training sessions in web UI = Number of images used for training × Number of repetitions
If you use images in the same directory, the total number of training sessions is multiplied by 2, and the number of times that the model is saved is halved in Kohya.
Save every N epochs
The training results are saved every N training epochs. If you set the value to 2, the training results are saved every two epochs of training.
Caption Extension
Optional. The name extension of the caption file. Example: .txt. Optional.
Mixed precision
The precision for mixed-precision training. Configure the parameter based on the GPU performance. Valid values: no, fp16, and bf16. If the memory of the GPU that you use is larger than 30 GB, we recommend that you set the value to bf16.
Save precision
The precision at which the model is saved. If the memory of the GPU you use is larger than 30 GB, we recommend that you set the value to bf16.
Number of CPU threads per core
The number of threads per vCPU. Configure the parameter based on your business requirements.
Learning rate
The learning rate. Default value: 0.0001.
LR Scheduler
The learning rate scheduler. Configure the parameter based on your business requirements.
LR Warmup (% of steps)
The warm-up steps of the learning rate. Configure the parameter based on your business requirements. The default value is 10. You can set the value to 0 if no warm-up is required.
Optimizer
The optimizer. Configure the parameter based on your business requirements. The default value is
AdamW8bit
. The valueDAdaptation
indicates that automatic optimization is enabled.Max Resolution
The maximum resolution. Configure the parameter based on your business requirements.
Network Rank (Dimension)
The complexity of the model. We recommend that you set the value to 128.
Network Alpha
In most cases, the value of this parameter is less than or equal to the value of the Network Rank (Dimension) parameter. We recommend that you set Network Rank (Dimension) to 128 and Network Alpha to 64.
Convolution Rank (Dimension)
& Convolution Alpha
The convolution, which indicates the degree to which the model is fine-tuned by LoRA. Configure the parameter based on the LoRA type.
Based on the official guide of Kohya:
If the LoRA type is
LyCORIS/LoCon
, setConvolution Rank (Dimension)
to a value less than or equal to 64, andConvolution Alpha
to 1. You can set Convolution Alpha to a lower value based on your business requirements.If the LoRA type is
LyCORIS/LoHa
, setConvolution Rank (Dimension)
to a value less than or equal to 32, andConvolution Alpha
to 1.
clip skip
The number of times the CLIP model is used. Valid values: 1 to 12. A smaller value indicates that the generated image is closer to the image or input image.
Realism: Set to 1.
Anime, comics, and games (ACG): Set to 2.
Sample every n epoch
The sample is saved every N epochs. The sample is saved every N epochs.
Sample Prompts
The sample prompt. Valid parameters:
--n:
the prompts, or negative prompts.--w:
the width of the image.--h:
the height of the image.--d:
the seed of the image.--l:
the Classifier Free Guidance (CFG) scale, which indicates the relevance of the image generation to the prompt.--s:
: the number of iteration steps.
Click Train model to start training.
On the Elastic Algorithm Service (EAS) page, click the name of the service that you want to view. Click Service Logs to view the training progress in real time.
When
model saved
appears in the log, the training is completed.After the training is completed, obtain the LoRA model file from the directory of the
Model
folder that you specified. Example:my_model.safetensors
.
Use a trained LoRA model for AI image generation based on Stable Diffusion
After you trained a LoRA model, you can upload the model to the directory of the SD web application. This way, you can use the trained LoRA model to generate images. For information about how to deploy SD, see Deploy Stable Diffusion for AI image generation with EAS in a few clicks.
The following section describes how to upload a LoRA model to the SD web application.
SD web application (cluster edition)
Configure the Model Service Information for the SD web application, as shown in the following figure.
NoteYou must select a
-cluster
version for thestable-diffusion-webui
image. After the service is started, the/data-{User_ID}/models/Lora
path is automatically created in the mounted OSS path.Add the following parameters to Command to Run:
--lora-dir:
optional.If you do not specify the
--lora-dir
parameter, the model files of users are isolated. Only the model files in the{OSS path}/data-{User_ID}/models/Lora
directory are loaded.If you specify the
--lora-dir
parameter, the files in the specified directory and the directory{OSS path}/data-{current logon user ID}/models/Lora
are loaded. Example:--lora-dir /code/stable-diffusion-webui/data-oss/models/Lora
.
--data-dir {OSS mount path}
. Example:--data-dir /code/stable-diffusion-webui/data-oss
.
Upload the LoRA model file to the directory
{OSS path}/data-{User_ID}/models/Lora
directory. Example:oss://bucket-test/data-oss/data-1596******100/models/Lora
.NoteAfter the service is started, the
/data-{Current logon user ID}/models/Lora
path is automatically created in OSS. You must upload the LoRA model file after the service is started.You can obtain the
{User_ID}
in the upper-right corner of the page next to the personal profile image.
SD web application (basic version)
Configure the Model Service Information for the web UI of Stable Diffusion as shown in the following figure.
NoteYou must select a non
-cluster
version for thestable-diffusion-webui
image. After the service is started, the/models/Lora
path is automatically created in the mounted OSS path.Add the
--data-dir {OSS mount path}
parameter to the Command to Run. Example:--data-dir /code/stable-diffusion-webui/data-oss
.Upload the LoRA model file to the
{OSS path}/models/Lora
directory. Example:oss://bucket-test/data-oss/models/Lora
.NoteAfter the service is started, the
/models/Lora
path is automatically created in the mounted OSS bucket. You do not need to create a path. You must upload the LoRA model file after the service is started.