Web ML For Selfie and Face Effects — Part 1 [Selfie Segmentation]

Published in

Heartbeat

5 min readMar 10, 2022

One of the biggest blowups that came with the smartphone era is the ability to capture selfies, such a powerful concept that Oxford dictionary added the word to the dictionary. This phenomenon led to the growth of social media apps like Snapchat, which innovated the creation of selfies by using filters. As time went by, video conferencing apps like Zoom innovated on video streaming for front-cameras by allowing users to hide/change their backgrounds.

Some of these innovative features are powered by machine learning and we are going to look at how to implement some of these effects in browsers using MediaPipe’s solutions. In this series, we are going to look at Selfie Segmentation and Face Tracking in the last part. To quote MediaPipe’s official website:

“MediaPipe Selfie Segmentation segments the prominent humans in the scene. It can run in real-time on both smartphones and laptops. The intended use cases include selfie effects and video conferencing, where the person is close (< 2m) to the camera.”

Source: Google AI Blog: Background Features in Google Meet, Powered by Web ML (googleblog.com)

This ML solution is powered by MobileNetV3 and has two models:

General model that has an input tensor shaped as 256 x 256 x 3 and outputs 256 x 256 x 1 representing the segmentation mask. This model is being used in Google’s ML kit
Landscape model whose input tensor is 144 x 256 x 3, has fewer FLOPS thus runs faster. A variant of this model is being used in Google Meet

Note: MediaPipe’s solution does the resizing of the images so you can focus on building your experience.

Prerequisites

You need to have NodeJs installed in your machine and you can follow the documentation to get started. Also, install Angular CLI by following the official docs. During the writing of this tutorial, the version used are Node v14.18.0, NPM v6.14.15 and Angular CLI v12.2.7.

Project Setup

In the root folder where you want to create your project, run the following command to create a new angular application named selfie-segmentation.

ng n selfie-segmentation — routing — style=scss

Next, create a service called ml that will be used to initialize the models and run inference.

ng g s services/ml

Make sure you configure the app module appropriately by importing the service as a provider.

Install the dependencies needed for the web application: Bootstrap for the UI styling & formatting and MediaPose’s Selfie Segmentation models.

ng add ng-bootstrap && npm i -s @mediapipe/selfie_segmentation

Finally, configure the angular build system to ensure that files required to run the ML models are served from the local development server and/or hosting server. For local development, update the scripts section in the package.json file.

“scripts”: {
   ...
   “start”: “npm run copy:all && ng serve”,
   “copy:all”: “ npm run copy:selfie_segmentation”,
   “copy:selfie_segmentation”: “cp -R node_modules/@mediapip/selfie_segmentation src/assets”
}

For production builds, update the build target in the angular.json file as shown below:

"assets”: [
   …
   {
      “glob”: “**/*”,
      “input”: “./node_modules/@mediapipe/selfie_segmentation”,
      “output”: “./assets/selfie_segmentation”
   }
],

Implementing UI

We’re using bootstrap for styling and laying out the HTML elements required. There are two main elements needed to run the application: one is a video HTML tag for displaying and capturing our video stream. The other is a canvas HTML tag for displaying the segmentation mask. Make sure you decide on the dimensions of the input images eg 1280px by 720 px and hard code these values as the size of the canvas using the dimensions. Open the app component HTML file and add the following HTML markup.

How do large scale teams identify the right MLOps tools for them? Check out this clip from our recent customer roundtable to find out!

Implementing ML

Open the ml service created earlier and import MediaPipe’s selfie solution into the project.

import { SelfieSegmentation } from “@mediapipe/selfie_segmentation”;

In the class definition for the service, we are going to create three functions. The first one will be for initializing the ML models using the locally served files and configuring the selfie input options. For more information on configuring the selfie segmentation models, refer to the documentation.

The second function will be to initialise the web camera using MediaPose’s camera utilities and send the video frames to the initialized model for inference. Make sure the configured width and height of the camera is the same as for the video and canvas tags created in the previous section. You can be creative here and draw backgrounds behind the segmentation mask as the canvas element allows for drawing layers, one on top of the other. See this canvas documentation for more details.

The final function will be for pushing the results to an observable subject that can be subscribed and use the changing data to draw the segmentation mask onto an HTML canvas. For this tutorial, we are drawing the segmentation mask alone.

Integrate the user interface with the ML service by opening the app component typescript file and adding the following logic:

Import, configure and initialize the ML service.
Start the camera for inference and post-process results with the video and canvas elements created.

Conclusion

With MediaPipe’s machine learning solutions, it is easy to implement selfie segmentation in modern web applications using the angular framework. It is important to note that the browser freezes when the initial frame is passed to the model for inference, and this is due to the processing needed to initialize the model’s graphs.

You can optimize this by implementing web workers and taking this process off the main thread that’s used to render the user interface. Follow Angular’s documentation for adding web workers to your project. In the final part of this tutorial, we are going to add face tracking capabilities that can be used to create additional effects.

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletters (Deep Learning Weekly and the Comet Newsletter), join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.