If your business involves metric learning, you can use the image metric learning (raw) component of Platform for AI (PAI) to build metric learning models for inference. This topic describes how to configure the image metric learning (raw) component and provides an example on how to use the component.
Prerequisites
OSS is activated, and Machine Learning Studio is authorized to access OSS. For more information, see Activate OSS and Grant the permissions that are required to use Machine Learning Designer.
Limits
You can use the image metric learning (raw) component with the computing resources of Deep Learning Containers (DLC).
Overview
The image metric learning (raw) component provides mainstream models such as ResNet50, ResNet18, ResNet34, ResNet101, swint_tiny, swint_small, swint_base, vit_tiny, vit_small, vit_base, xcit_tiny, xcit_small, and xcit_base.
Configure the component in the PAI console
Input ports
Input port (from left to right)
Data type
Recommended upstream component
Required
data annotation path for training
OSS
No
data annotation path for evaluation
OSS
No
Component parameters
Tab
Parameter
Required
Description
Default value
Fields Setting
model type
Yes
The model type used for training. Valid values:
DataParallelMetricLearning
ModelParallelMetricLearning
DataParallelMetricLearning
oss dir to save model
Yes
The Object Storage Service (OSS) directory in which the training model is stored. Example:
oss://examplebucket/yun****/designer_test
.None
oss annotation path for training data
No
If you do not specify the labeled training data by using an input port, you must configure this parameter.
NoteIf you use both an input port and this parameter to specify the labeled training data, the value specified by the input port takes precedence.
The OSS path in which the labeled training data is stored. Example:
oss://examplebucket/yun****/data/imagenet/meta/train_labeled.txt
.Each data record in the train_labeled.txt file is stored in the
absolute path/image name.jpg label_id
format.Importantimage storage path and label_id are separated by a space.
None
oss annotation path for evaluation data
No
If you do not use the data annotation path for evaluation input port to specify the labeled evaluation data, you must configure this parameter.
NoteIf you use both an input port and this parameter to specify the labeled evaluation data, the value specified by the input port takes precedence.
The OSS path in which the labeled evaluation data is stored. Example:
oss://examplebucket/yun****/data/imagenet/meta/val_labeled.txt
.Each data record in the val_labeled.txt file is stored in the
absolute path/image name.jpg label_id
format.Importantimage storage path and label_id are separated by a space.
None
class list file
No
You can specify the class name or set the parameter to the OSS path where the TXT file that contains the class name is located.
None
Data Source Type
Yes
The type of input data. Valid values: ClsSourceImageList and ClsSourceItag.
ClsSourceImageList
oss path for pretrained model
No
The OSS path of your pre-trained model. If you have a pre-trained model, set this parameter to the OSS path of your pre-trained model. If you do not configure this parameter, the default pre-trained model provided by PAI is used.
None
Parameters Setting
backbone
Yes
The backbone model that you want to use. Valid values:
resnet_50
resnet_18
resnet_34
resnet_101
swin_transformer_tiny
swin_transformer_small
swin_transformer_base
resnet50
image size after resizing
Yes
The size of the resized image. Unit: pixels.
224
backbone output channels
Yes
The feature dimensions exported by the mainstream model. The value must be an integer.
2048
backbone output channels
Yes
The feature dimensions exported by the neck. The value must be an integer.
1536
training data classification label range
Yes
The number of dimensions in the output data.
None
metric loss
Yes
The loss function evaluates the degree of inconsistency between values predicted by the training model and actual values. The scope of the event. Valid values:
AMSoftmax recommend margin 0.4 scale 30
ArcFaceLoss recommend margin 28.6 scale 64
CosFaceLoss recommend margin 0.35 scale 64
LargeMarginSoftmaxLoss recommend margin 4 scale 1
SphereFaceLoss recommend margin 4 scale 1
ModelParallel AMSoftmax
ModelParallel Softmax
AMSoftmax recommend margin 0.4 scale 30
metric learning loss scale parameter
Yes
The scale that you want to use for the loss function. Configure this parameter based on the loss function that you select.
30
metric learning loss margin parameter
Yes
The margin that you want to use for the loss function. Configure this parameter based on the loss function that you select.
0.4
metric learning loss weight in all losses
No
The weight that you want to use for the loss function, which indicates the optimization ratio of metric learning and the classification model.
1.0
optimizer
Yes
The optimization method for model training. Valid values:
SGD
AdamW
SGD
initial learning rate
Yes
The initial learning rate. The value is a floating-point number.
0.03
batch size
Yes
The size of a training batch, which indicates the number of data samples used for model training in each iteration.
None
total train epochs
Yes
The total number of epochs. An epoch ends when a round of training is complete on all data samples. The total number of epochs indicates the total number of training rounds conducted on data samples.
200
save checkpoint epoch
No
The frequency at which a checkpoint is saved. The value of 1 indicates that a checkpoint is saved each time an epoch ends.
10
Execution Tuning
io thread num for training.
No
The number of threads used to read the training data.
4
use fp 16
No
Specifies whether to enable FP16 to reduce memory usage during model training.
None
single worker or distributed on MaxCompute or DLC
Yes
The compute engine that is used to run the component. You can select a compute engine based on your business requirements. Valid values:
single_on_dlc
distribute_on_dlc
single_on_dlc
number of worker
No
If you select distribute_on_dlc for single worker or distributed on MaxCompute or DLC parameter, you configure set this parameter.
The number of concurrent workers used in computing.
1
gpu machine type
Yes
The GPU specifications that you want to use.
8vCPU+60GB Mem+1xp100-ecs.gn5-c8g1.2xlarge
Examples
The following figure shows a sample pipeline in which the image metric learning (raw) component is used. In this example, configure the components in the preceding figure by performing the following steps:
Prepare data. Label images by using iTAG provided by PAI. For more information, see iTAG.
Use the Read File Data-4 and Read File Data-5 components to read the labeled training data and labeled evaluation data. To read the data, set the OSS Data Path parameter of each component to the OSS path in which the data that you want to retrieve is stored.
Draw lines from the preceding two components to the image metric learning (raw) component and configure the parameters for the image metric learning (raw) component. For more information, see the "Configure component in the PAI console" section of this topic.