Every once in a while, a new technology, an old problem, and a big idea turn into an innovation – Dean Kamen

Looking around at the AI landscape, we can see the exciting quantity of technical innovation in every sector. Using this innovation carefully means that strategical use of existing technology stacks that retailers might already own can be leveraged to bring even more value back to the business, and for fashion retailers, in particular, this will increase their ability to really personalise the shopping experience for their customers.

So what is that technology stack?

Well, CCTV cameras are almost in every fashion retail store and that is enough for us to get started. To start with, the old recordings can be used to understand more about the store and the customers. This preliminary work will set a strong foundation for building new machine learning models to provide insights to stores which they can act upon easily. The solution we use to facilitate this is the AWS Media Analysis Solution.

Before we jump into some of the amazing use cases we worked on, let have a closer look at the AWS Media Analysis Solution’s architecture:

AWS Media Analysis Solution

Ref: https://amzn.to/2H8mI5N

As AWS says, the solution enables customers to quickly and seamlessly extract key details from their media files. This solution uses Amazon Rekognition for facial recognition and makes use of other AWS ML services for NLP and Text-to-Speech. We are interested in the capabilities of Amazon Rekognition for our use cases.

If we go back to the store’s CCTV footage, the video data will be stored in S3. Similar to the architecture above, we are using the Lambda service to interact with Amazon Rekognition to extract required key details from the video.

Now we are ready to make use of our custom models to enable those amazing use cases!

FYI – before feeding data into our models we make use of our data sampling service, running on a Lambda function.

Age and gender estimation: Based on the multiple age groups and gender predicted by Amazon Rekognition for an individual appearing in multiple frames, this model estimates a single value age and gender that is consistent throughout the video. Check out the sample below:

Clothing detection: We have trained a custom model to detect clothing objects such as shorts, shirts, t-shirts, etc and then further classify high-level classes, i.e Indian or Western. For model training, we deployed an AWS Deep Learning AMI. Inference for the trained model is happening on a Lambda with model weights stored on S3. Check out the sample results below:

 

Waist size estimation: Using the sampled data, this model estimates the approximate waist size of an individual in the given video frame. The output is used to estimate clothing size (tops and bottoms) for customers. Check out the sample results below:

Shoe detection: We have trained a custom model to detect shoes on the detected human object and provide some key labels along with show detection landmarks. Key labels are shoe brand, shoe size, shoe colour and shoe direction (to the set entry door divider). Again, for model training, we deployed an AWS Deep Learning AMI. Inference for the trained model is happening on a Lambda with model weights stored on S3. Check out the sample results below:

These high-level use cases above give retailers details about their customers. This rich information can be used to prepare a dynamic profile for each customer within the store, followed with the insights on their behaviour in-store, i.e. walk patterns, item touching, landmarks where they spent most of their time and much more. With this kind of information, stores can move particular products to more prominent positions within the store, as well as retail products more suited to their customer base – all of which will drive sales conversions and revenue.

For more information about how we used the AWS Media Analysis Solution to drive retail sales, please contact us on [email protected]