Training a Model

Click the ‘Training’ option from the left navigation menu. It has has 3 sections:

1. New Training

a) Select either Search by Model Type or Search by Hardware

This selection facilitates you to check if a certain model is compatible with edge hardware.

Search by Model Type –  You can select this option if you have decided upon a model type that you want to train. This option doesn’t take into consideration any hardware compatibility and trains the model selected by you.

If you don’t have any hardware consideration at the moment and want to only train the model, then this is a suitable option to select.

Search by Hardware-  You can select this option if you have a pre-decided hardware for which you want to train a model. In this case, when a hardware is selected only supported models corresponding to that particular hardware will be listed in the dropdown selection list. It helps you to make an informed decision while selecting the model.

If you have decided on an edge hardware or are considering certain edge hardware to use but not sure about the model type, then this is a suitable option.

b) Select Model Details

Model type – Select the computer vision task you want to perform, classification or detection

Model family – Select the corresponding model family from the dropdown list. The list contains the available models to the model type selected in the previous step. 

Classification –  ResNet, VggNet, DenseNet, SqueezeNet, AlexNet, MobileNet

Detection – YOLO, SSD

Model name–  Select the corresponding model name from the dropdown list. The list contains the available model names to the model family selected in previous step.

Classification

  • ResNet18, ResNet34, ResNet50, ResNet101, ResNet152
  • Vgg16, vgg11, vgg13_bn, vgg13, vgg16_bn, vgg19_bn, vgg19, vgg11_bn
  • DenseNet121, DenseNet161, DenseNet169, DenseNet201
  • Squeezenet1_1, Squeezenet1_0
  • Alexnet
  • Mobilenet_v2

Detection

  • YOLOv5s
  • YOLOv3
  • Yolov3-tiny
  • Yolov5m

c) Select the parameters

Epochs – Epochs is the no. of iterations or no. of passes the algorithm works though the entire dataset. By default the best selection for no. of epochs is displayed on model selection.

Batch Size – Batch Size is the no. of samples that pass through the neural networks in one pass. By default the best selection for batch size is displayed on model selection.

Advanced setting – With this feature, you can tune the hyperparameters. The input box already contains the required important parameters with optimal settings corresponding to the model selected. If you want to adjust these parameters as per your requirement, you can edit the input parameters in this field. Below are the parameters available in advanced settings for different models:

  1. All Classification models –
    1. “trainvalsplit”:0.7 (obsolete) 
    2. “learning_rate”: Learning rate for training 
    3. “momentum”: Momentum of gradient descent 
    4. “classifer_num_hidden_layers”: Number of additional dense layer added at end of pretrained model 
    5. “Classifier_layer_size”: No of neurons in each of the additional dense layers added 
    6. “classifier_layer_dropout_prob”: Dropout probability in each of the added additional dense layers g. “mode”: Currently, only train mode is available. It means training the classification model. 
  2. Yolov3 and yolov3-tiny models – 
    1. “weights”: Initial weights to choose for training yolov3 model. Choices are – [“yolov3.pt”, “yolov3-tiny.pt”]. Set it to “yolov3.pt” for yolov3 training and “yolov3-tiny.pt” for yolov3-tiny training. 
    2. “cfg”: Contains the path of the hyperparameters file and data info file. Choices are [“cfg/yolov3-custom.cfg”, “cfg/yolov3-tiny-custom.cfg”]. Set it to “cfg/yolov3-custom.cfg” for yolov3 training and “cfg/yolov3-tiny-custom.cfg” for yolov3-tiny training. 
    3. “data”: Contains relative path information about the data provided. Always set it to “data_cfg/custom.data”. 
    4. “class_names”: Please provide the list of names of classes/objects in the given dataset. 
    5. “hyp”: Additional hyperparameters information. Always set it to “data/hyp.scratch.yaml”. 
    6. “nc”: Number of classes/objects to be detected in the given dataset. The number of class names provided in the class_names field should match with the number provided here. 
    7. “img-size”: Size to which training images are resized during training. Supported choices are [416,416] and [640,640] 
    8. “resume”: Resume previously stopped training. Always set it to “False” i. “device”: Specify cuda device to run on j. “workers”: Specify number of data loader workers 
    9. “subdivisions”: batch_size/subdivisions = no of images taken per step during yolov3 training. Keep subdivisions same as batch size 
  3. Yolov5 models –
    1. “weights”: Initial weights to choose for training yolov5 model. Choices are – [“yolov5s.pt”, “yolov5m.pt”]. Set it to “yolov5s.pt” for yolov5-small model training and “yolov5m.pt” for yolov5-medium model training. 
    2. “cfg”: “cfg/yolov5s.yaml”Contains the path of the hyperparameters file and data info file. Choices are [“cfg/yolov5s.yaml”, “cfg/yolov5m.yaml”]. Set it to “cfg/yolov5s.yaml” for yolov5-small model training and “cfg/yolov5m.yaml” for yolov5-medium model training. 
    3. “data”: Contains relative path information about the data provided. Always set it to “data_cfg/data.yaml”. 
    4. “hyp”: Additional hyperparameters information. Always set it to “data/hyp.scratch.yaml”. 
    5. “img-size”: Size to which training images are resized during training. Supported choices are [416,416] and [640,640] 
    6. “resume”: Resume previously stopped training. Always set it to “False”
    7. “device”: Specify cuda device to run on h. “Workers”: Specify number of data loader workers i. “class_names”: Please provide the list of names of classes/objects in the given dataset. 
  4. SSD mobilenet v2 model – 
    1. “num_classes”: Number of classes/objects to be detected in the given dataset. The number of class names provided in the class_names field should match with the number provided here. 
    2. “initial_learning_rate”: Learning rate for training ssd mobilenet v2 model 
    3. “class_names”: Please provide the list of names of classes/objects in the given dataset. 
    4. “img_size”: Size to which training images are resized during training. Always set it to [300,300] as the SSD model only supports that dimensions.

d) Dataset

Data URL – The path to your annotated data set placed in Amazon S3 bucket. 

For image dataset:

  1. The dataset folder has to be annotated and structured according to the guidelines mentioned in the document —–
  2. The dataset folder has to be in the Amazon S3 bucket.
  3. Please provide only publicly accessible Amazon S3 buckets

We provide the path to a dataset by default on model selection, which can be used for exploring ENAP Studio functionalities.

We are working towards including private S3 repositories and other data sources which will soon be available.

If you have a requirement to get your dataset annotated for use in ENAP Studio please mention it as a feedback in the ENAP Studio or write to us at -connect@edgeneural.ai , with the subject “Require dataset annotation”. 

Once the selections are completed, click ‘Start Training’ to begin model training. You can also run parallel training sessions. 

2. In Progress

Displays the details of all model trainings in progress. 

You can click on the info button  of the models listed, for detailed progress of the training.

Following details can be viewed in the progress section:

  1. Training progress details like training start date, total epochs, epochs completed and status of training.
  2. Logs of the training process 
  3. Option to abort the training if required
  4. Visual representation of the iterations as they happen.  
  5. Graphical view of the significant parameters like Accuracy, Loss function, and dataset. These graphs will vary according to the model type selected.

Accuracy graph – Depicts the accuracy of Training and Validation. It will show the corresponding data for every epoch run.

For classification, it depicts the train and validation accuracy with mAP for every epoch.

For detection, it depicts the mAP for and ______ for every epoch. 

Loss function graph – Depicts the loss function for every epoch.

Dataset graph – Depicts all the classes present in the dataset. For example, if classes defined are cat, dog, horse then the graph depicts the no. of images with these classes in the training dataset and test dataset.

Note- Graphs won’t be available for yolov3 and yolov3-tiny. Though once the training is completed you will be able to see the statistics.

3. Reports

It lists all the models trained by a user. 

It will list the following details 

  1. Model details – model type, model family and model name
  2. Version – If a particular model is retrained using ENAP Studio, it maintains the version of these, facilitated by integrated MLOps
  3. Status of model training – Shows the model training status which be one among these: Completed, in progress, aborted
  4. Artifact –  You can download the trained model artifact. This will be be downloaded on your computer as a zip folder(with all files)
  5. Info     – Graphically view all the data like loss function, accuracy as viewed in the progress module along with the summary of the model details.

Retraining the models is essential for real-world applications but it is tiring to go through the entire process again and keep track of these models. ENAP Studio helps to overcome this challenge by integrating MLOps, so a user can easily retrain a model and keep track of the versions of the models retrained making the task easy.

January 13, 2022