New product / service

ML Drift: Facilitating Local Inference

A team of researchers from Google and Meta has developed ML Drift, a solution to efficiently run artificial intelligence directly on the device, despite the challenges related to the diversity of GPU architectures. ML Drift, thanks to innovations like tensor virtualization, significantly improves performance and offers great compatibility across mobile and desktop platforms.

STStephane Nachez · ·2 min
ML Drift: Facilitating Local Inference
Contents

Most artificial intelligence models are inferred (that is, "executed") on servers. However, developing local inference, meaning directly on the device, would accelerate the spread of artificial intelligence, notably by reducing server constraints and enhancing privacy.

However, deploying generative AI models on various types of GPUs presents notable challenges: the diversity of GPU architectures, ranging from proprietary solutions to open platforms, makes the task complicated, with each type of GPU having its own characteristics and limitations. 

Facing a growing risk of material dependency, optimizing performance on heterogeneous platforms becomes imperative to ensure smooth and efficient execution of generative models.

To address these challenges, a team of researchers from Google and Meta, including Jiuqiang Tang, Raman Sarokin, and Ekaterina Ignasheva, developed ML Drift, a solution intended for inference on various platforms. Their expertise lies in optimizing GPU inference engines, allowing efficient execution of generative AI workloads. ML Drift stands out for its ability to overcome the technical obstacles associated with inter-GPU API development, thus ensuring broad compatibility across mobile and desktop platforms.

Methodological Approach and Technical Innovations

ML Drift introduces several technical innovations, including tensor virtualization and optimized memory management. Tensor virtualization allows the decoupling of logical indices from the physical indices of the GPU, offering increased flexibility in memory layout and kernel optimization. Additionally, memory management and optimization strategies reduce memory footprint and improve performance.

Results and Future Perspectives

Performance evaluations of ML Drift show significant improvements over existing open-source solutions, with substantial gains in terms of performance (supporting 10 to 100 times more parameters). These promising results pave the way for future applications and improvements, notably the integration of advanced quantization techniques and exploration of specialized instructions for ML workloads. In the future, the team plans to extend ML Drift's capabilities to newer diffusion models and transformer-based architectures while exploring effective interoperability with heterogeneous processors.

 

Publication reference: arXiv:2505.00232v1

 

ST
Stephane Nachez

ActuIA editorial team — news, data and analysis on artificial intelligence for decision-makers.

Actors mentioned
JIJiuqiang Tang
RARaman Sarokin
EKEkaterina Ignasheva
GRGrant Jensen
LILin Chen
JUJuhyun Lee
ANAndrei Kulik
MAMatthias Grundmann
The ActuIA Weekly

Subscription confirmed, see you soon!