MoveNet by TensorFlow: the machine learning model analyzing the human body posture

31 mai 2021

Ronny Votel and Na Li of Google Research announced on the Tensorflow blog the release of a new posture detection model called MoveNet. The model is compatible with TensorFlow.js , the Javascript port of Tensorflow. The platform is capable of analyzing an image to provide accurate data on most of the poses a person can make. This information can be used in a medical, sports or simply to improve his daily comfort.

An application to provide remote care

MoveNet was created in collaboration with IncludeHealth, a company specializing in digital health. The app offers the possibility for a user to analyze the positioning of 17 key points of the human body. Thanks to this, it could be possible to offer remote care for patients who cannot go directly to a physiotherapist for example.

The application guides patients through a series of movements to be carried out almost daily called “routines”. They are digitally developed and prescribed by physiotherapists to test a person’s balance as well as the strength and amplitude of the movements they perform. At the same time, MoveNet analyzes all the key points to check if the patient is performing the movement correctly.

Eventually, the tool could be used by hospitals, the military or insurance companies to enable people who need special care to perform it correctly and potentially remotely. Ryan Eder, founder and CEO of IncludeHealth refers to this possibility in his words:

“The MoveNet model combines speed with accuracy, both of which are necessary to provide prescriptive care. While other models are either fast or accurate, MoveNet

has that unique balance that will enable the next generation of care delivery. The Google team has been a fantastic collaborator in this quest.”

The architecture and operation of MoveNet

The model, offered on TensorFlow Hub, comes in two variants depending on use cases:

Lightning, for latency critical applications.
Thunder, for applications requiring greater accuracy.

Regardless of the variant, the system runs very fast as it exceeds 30 frames per second on most smartphones and computers. MoveNet is based on an estimation model composed of a MobileNetV2 feature extractor that exploits a pyramid network of objects (FPN) and a set of 4 prediction heads as shown below:

Each of the heads is at the origin of a step in the sequence of operations that allows the model to function and thus, to define all 17 key points of a posture:

Global heat map: prediction of a person’s center of gravity. This data is then taken into account by the other predictive heads.
Key points regression field: Based on the center of gravity, the model predicts all the key points of a person according to his position at time T.
Heat map of key points: Using the regression field, the system then predicts the location of all the key points of a person, taking into account only the person in the foreground.
Two-dimensional shift field: The final set of key points is selected by taking into account local 2D shift predictions to refine the final result.

MoveNet

: a model designed to run in the browser
MoveNet has been trained using two databases. The first, called COCO, is more useful for more natural postures, while the second, called Active, is Google’s own and is suitable for a dance or fitness application. This second database was designed by tagging all of the key points of people in yoga, fitness or dance videos available on YouTube while drawing on the 17 key points already identified by COCO.

In order to ensure fast execution with TensorFlow.js, the entire model output was bundled into a single output tensor. In addition, the number of convolution filters used in each prediction head was greatly reduced to improve the performance of the model. 192 x 192 inputs (for Lightning) or 256 x 256 inputs (for Thunder) are used by the model in the same way. Combined with a high speed camera, the model can even apply key point smoothing.

A non-linear filter is also used to suppress high-frequency interference and anomalous values that might be generated by fast movements, while maintaining a high bandwidth. In the near future, the authors of MoveNet hope that it will be able to track the movement of several people at once so that it can be used in groups.

Its compatibility with Tensorflow.js allows to foresee a large diffusion of the model and a multitude of use cases since it can be executed directly from the browser. It is possible to test it here.

Its integration on a web page requires only a few lines of Javascript :

Translated from MoveNet par TensorFlow : le modèle de machine learning analysant la posture du corps humain