This topic describes how to use Stable Diffusion web UI.
Background Information
Stable Diffusion is a text-to-image AI model developed by Stability AI that can generate and modify images based on text prompts. Stable Diffusion WebUI is a browser interface for Stable Diffusion developed by AUTOMATIC1111 that help you use Stable Diffusion through intuitive interactions.
The web UI provides an intuitive operating experience and allows customization with different plugins and models. You can use the web UI to create more controllable visual works.
User interface
No. | Section |
① | |
② | |
③ | |
④ | Result preview |
Model selection
Upload your models to a directory in Object Storage Service (OSS) and refresh the web UI. Then, you can select the model that you want to use in this section. We recommend that you download models from Civitai.
Features
txt2img
Prompt syntax
General tips
Prompts generally include the following types of keywords: prefixes (image quality, painting style, lens effects, and lighting effects), subjects (person or object, posture, clothing, and property), and scenes (environment and details).
If you want to increase or decrease the weight of a keyword, use parentheses (
()
,{}
,[]
) to enclose the keyword, which is followed by a colon (:) and a weight value. Example: (beautiful:1.3). We recommend that you use a weight value that ranges from 0.4 to 1.6. Keywords with lower weights are more likely to be ignored, and keywords with higher weights are prone to deformation due to overfitting. You can also use multiple pairs of parentheses (()
,{}
,[]
) to superimpose the weight of a keyword. An additional pair of parentheses indicates that the weight is increased or decreased to 1.1, 1.05 or 0.952 times. Example: (((cute))).Weight calculation rules:
(PromptA:weight value): You can use this format to increase or decrease the weight of a keyword. If the weight value is greater than 1, the weight is increased. If the weight value is less than 1, the weight is decreased.
(PromptB) indicates that the weight of PromptB is increased to 1.1 times, which is equivalent to (PromptB:1.1).
{PromptC} indicates that the weight of PromptC is increased to 1.05 times, which is equivalent to (PromptC:1.05).
[PromptD] indicates that the weight of PromptD is decreased to 0.952 times, which is equivalent to (PromptD:0.952).
((PromptE)) is equivalent to (PromptE:1.1*1.1).
{{PromptF}} is equivalent to (PromptF:1.05*1.05).
[[PromptG]] is equivalent to (PromptG:0.952*0.952).
You can use angle brackets (
<>
) to invoke Low-Rank Adaptation (LoRA) and Hypernetwork models. Formats: <lora:filename:multiplier> and <hypernet:filename:multiplier>.
Common prompts
Positive prompts
Negative prompts
Positive prompt
Description
Negative prompt
Description
HDR,UHD,8K,4K
Improves the quality of the image.
mutated hands and fingers
Prevents mutated hands or fingers.
best quality
Vivifies the image.
deformed
Prevents deformation.
masterpiece
Makes the image look like a masterpiece.
bad anatomy
Prevents bad anatomy.
Highly detailed
Adds details to the image.
disfigured
Prevents disfigurement.
Studio lighting
Applies studio lighting to make the image textured.
poorly drawn face
Prevents poorly drawn faces.
ultra-fine painting
Applies ultra-fine painting.
mutated
Prevents mutation.
sharp focus
Brings the image into sharp focus.
extra limb
Prevents extra limbs.
physically-based rendering
Adopts physical rendering.
ugly
Prevents ugly elements.
extreme detail description
Focuses on the details.
poorly drawn hands
Prevents poorly drawn hands.
Vivid Colors
Makes the image vivid in colors.
missing limb
Avoids missing limbs.
(EOS R8, 50mm, F1.2, 8K, RAW photo:1.2)
Incorporates professional photographic styles.
floating limbs
Prevents floating limbs.
Boken
Blurs the background and highlights the subject.
disconnected limbs
Prevents disconnected limbs.
Sketch
Uses sketch as the drawing method.
malformed hands
Prevents deformed hands.
Painting
Uses painting as the drawing method.
variant
Prevents out-of-focus.
-
-
long neck
Prevents long necks.
-
-
long body
Prevents long bodies.
txt2img showcase
Simple prompts: Enter the positive and negative prompts. The positive prompt specifies the elements that you want to appear in the generated image, whereas the negative prompt specifies the elements that you want to prevent from appearing in the generated image. The more keywords you provide, the more the generated image will meet your expectations.
Complex prompts:
Positive prompt: 8k portrait of beautiful cyborg with brown hair, intricate, elegant, highly detailed, majestic, digital photography, art by artgerm and ruan jia and greg rutkowski surreal painting gold butterfly filigree, broken glass, (masterpiece, sidelighting, finely detailed beautiful eyes: 1.2), hdr
Negative prompt: canvas frame, cartoon, 3d, ((disfigured)), ((bad art)), ((deformed)),((extra limbs)),((close up)),((b&w)), wierd colors, blurry, (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy, 3d render
Prompt source: Civitai
The higher a keyword is ranked, the higher the weight of the keyword. You can adjust the order of keywords based on your needs. For example, you can arrange keywords in the following order: subject, medium, style, artist, website, resolution, additional details, color, and lighting. In actual use, you do not need to specify all types of the preceding keywords. Select and arrange the keywords based on your needs. Different from basic models, trained models often have hidden settings to use some fixed styles or discard some keywords in the prompts.
img2img
The img2img feature allows you to generate a new image from an existing image based on the prompts that you provide. For example, you can use this feature to convert a photo of a real person into an animation image. You can also color a sketch drawing. The img2img feature also allows you to modify parameters and redraw specific parts without affecting other parts. You can use the brush tool to modify the input image and generate a new image based on the modified image.
The generated image can be used as the input for the next img2img creation or partial re-editing, and can also be used in other features. In addition, the system can automatically generate keywords for the input image by using interrogation based on the CLIP and DeepBooru models. CLIP is suitable for realistic images and DeepBooru is suitable for cartoon images.
The following sections describe the procedures for using the img2img feature.
The following sections describe the parameters that are available for different features.
Image information
After you upload an image that is generated by Stable Diffusion, you can view the prompts and parameters of the image.
However, the screenshot of the original image or an image that is saved from another application cannot be identified in Stable Diffusion.
Parameters
Parameter | Description | Recommended value |
Sampling method
| The sampling mode of the diffusion denoising algorithm. Different sampling methods deliver different effects. Select an appropriate sampling method based on your needs. Euler a, DPM++ 2s a, and DPM++ 2s a Karras are similar in the overall composition of the image. Euler, DPM++ 2m and DPM++ 2m Karras are similar in the overall composition of the image. DDIM differs from other sampling methods in the overall composition of the image. | Euler a |
Sampling steps
| The sampling steps of the generated image. This parameter determines the AI optimization effect. With each iteration, the system compares the prompts with the current image and accordingly makes fine adjustments. If you increase the value of sampling steps, more time and computing resources are consumed but better results cannot be guaranteed. In actual use, an increasing value of sampling steps indicates that more details are displayed in the generated image, which is also closely related to the sampling method. For example, the sampling steps of Euler a range from 30 to 40. The image generated by using Euler a tends to be stable, and no details are added if you increase the value of sampling steps. | The default number of sampling steps on the web UI is 20. The LMS, DPM fast, and PLMS sampling methods require a larger value of sampling steps, which is from 30 to 50. You can adjust the value of sampling steps based on the actual effect. |
Restore faces | Specifies whether to refine the human face in an image. | If you want to refine the face in a panorama, you can select this option to optimize the details of the face. For a close-up image, we recommend that you do not select this option. Otherwise, overfitting may occur. |
Tiling
| Specifies whether to generate an image that can be tiled. | If you select this option, the image may not be generated as expected but video memory consumption is reduced. We recommend that you do not select this option. |
Hires. fix
| Specifies whether to use the high resolution for fixing. This parameter affects the resolution of the generated image. An AI image is generated in the following way: The system creates an image by using a low resolution, and then improves the details of the image without changing the composition of the image. If you select this option, you must configure the Hires steps and Denoising strength parameters. | If you select this option, the image may have more details but may not meet your expectations. |
Width and Height | The size and resolution of the generated image. The higher the resolution, the more details the image has but more video memory is consumed. We recommend that you do not set an excessively high resolution. | 512 × 512. This is the default value. |
Batch count | The total number of batches. This parameter affects the details of the image to be generated. The larger the value, the longer it takes to calculate. | 1. This is the default value. |
Batch size | The number of images that are generated in each batch at a time. A larger value causes higher video memory consumption. | 1 to 4 |
CFG Scale
| A larger value indicates that the AI generation is more compliant with the prompts that you provide. A smaller value indicates that the AI generation is more creative. | 7. This is the default value. |
Seed
| A value of -1 indicates that each generation is random. If you set a value other than -1, and do not change the model, GPU, and other parameters, the same image is generated every time. | -1. This is the default value. |
Model overview
For more information, see Models of ArtLab.
ControlNet plug-in
The MistoLine-SDXL-ControlNet plug-in that is provided by all versions of Stable Diffusion WebUI and ComfyUI in PAI ArtLab is developed by TheMisto.ai.
Parameters
Parameter | Description |
Enable | Specifies whether to enable ControlNet. |
Low VRAM | If the video memory is smaller than 4 GB, you can select this option. |
Preprocessor | The preprocessing effect varies based on the preprocessor. Each preprocessor has a corresponding model. You can use a preprocessor together with the corresponding model. |
Model | A model can be used together with a preprocessor. You must manually download a model and upload it to the corresponding directory in OSS. |
Control Weight | The weight of CnotrolNet in the AI generation. In the process of img2img generation, a low denoising strength in combination with a high control weight can change the filter and style of an image without changing the details of the image, and a high denoising strength in combination with a low control weight can modify the details of an image. |
Starting Control Step | The value is a percentage value that ranges from 0 to 1. This parameter indicates where ControlNet starts control. A value of 0 indicates that ControlNet starts control from the first step, and a value of 1 indicates that ControlNet starts control from the last step. A larger value indicates that CnotrolNet has less effect on the AI generation. For example, if you set the Sampling steps parameter to 20 and the Starting Control Step parameter to 0.3, ControlNet starts control from the sixth step based on the following formula: 20 × 0.3 = 6. |
Ending Control Step | The value is a percentage value that ranges from 0 to 1. This parameter indicates where CnotrolNet ends control. |
Control Type |
|