End to End Machine Learning with Training on-premises & inference in AWS using transfer learning (Part 2 of 2)

In part 1 of the blog series, we introduced the concepts of transfer learning and the components of the proof of concept. In part 2 we will look at how we leveraged Amazon SageMaker to do transfer learning with domain specific data with much improved accuracy.

Amazon SageMaker:

Amazon SageMaker is the end-to-end AI/ML in AWS providing tooling and compute for all phases of data processing and machine learning. In this solution, SageMaker is leveraged for transfer learning and inference at endpoints

Figure 7: Amazon SageMaker an end to end AI/ML training platform

Amazon SageMaker provides tooling for all the different phases such as data preparation, build, training and deployment for Machine Learning. In this solution we are using Amazon SageMaker for tuning and deployment of transfer learning-based model. As all the heavy lifting in the training has been done in the on-premises with GPUs, we can use cheaper CPU based instances in AWS for the transfer learning.

The Solution:

Figure 8: Pretrained model transferred to AWS S3 bucket

The on-premises data center has GPU and other high performance machine learning resources. The general model training is done on-premises leveraging the large image dataset. The model is then transferred to AWS SageMaker for transfer learning

Figure 9: Code used for the transfer learning

The code shown takes the original model and retrains it with a domain specific dataset. It carries forward the weights from the original model and uses it as the starting point for the new model. We remove the last layer and start training with the new data. The second last line shows the location of the s3 bucket for the trained model and the location of the new data.

Figure 10: Accuracy of model with no transfer learning

Domain specific data is used for creating a test model before transfer learning and its accuracy is evaluated. Full training on domain data only shows the accuracy is very poor. A model trained with just the data available for the domain is unacceptable and unusable for inference.

Figure 11: Training logs showing accuracy over different steps

The output of the transfer learning training run in AWS is shown. The validation accuracy over the different steps is displayed. The best accuracy value is highlighted in the bottom. CPU instances are used for this training. Since this is transfer learning with only limited amount of data CPU based processing works well

Figure 12: Model accuracy with transfer learning

Pretrained model with transfer learning drastically increases accuracy. With this high level of accuracy, the model can now be deployed in production for inference

Transfer Learned model for Inference:

AWS SageMaker through its endpoints provides the ability to take the trained model and use it for inference. Some of the capabilities and the flow is shown in the figure below

The left-hand side shows the on-premises training where a general model is created
In the right-hand side, a well-tuned model built with transfer learning is generated, exported and deployed to SageMaker endpoints using a workflow as shown.

Figure 13: Transfer learning end to end with inference on Amazon SageMaker

Key Takeaways:

By leveraging high performance GPU compute on-premises and transferring the learning to the cloud. We leveraged the AWS SageMaker platform to add domain specific data in the cloud to retrain the model successfully. Here are the key takeaways:

VMware Platform can be leveraged for GPU based Machine Learning on-premises
Complex Models can be trained on-premises with GPU enabled VMware infrastructure
Trained models can be transferred and reused effectively in the cloud
AWS SageMaker provides the capability to import pre-trained models for transfer learning.
Transfer Learning Models re-trained with more specific data have excellent accuracy
AWS SageMaker endpoints can be used for inference in cloud end-points

Amazon SageMaker:

The Solution:

Transfer Learned model for Inference:

Key Takeaways:

Related Articles

Multi-Cloud Machine Learning with data from on-premises and training with Google Cloud Vertex platform (Part 2 of 2)

Multi-Cloud Machine Learning with data from on-premises and training with Google Cloud Vertex platform (Part 1 of 2)

End to End Machine Learning with Training on-premises & inference in AWS using transfer learning (Part 2 of 2)