After presenting a deep learning method for AI models to forget certain information and its findings in computer vision and supervised learning, Facebook AI recently unveiled TextStyleBrush, a research project that can replace text with other text while maintaining the style and appearance of the original font. The authors believe the tool could have many applications and would lay the groundwork for future innovations such as photorealistic language translation in augmented reality.
The challenge of a self-supervised AI model to retain the original writing style
Facebook AI announced the design of a self-supervised AI model that can retain the handwriting style of its choice, based on a single word written in that manner. The tool can thus reproduce the style of an original word and generalize it by applying it to the desired text.
But the tool becomes more impressive when it comes to substituting text, whether handwritten or typographic, within images:
Nothing in itself that Photoshop doesn’t already allow to do, but the revolution is in the immediacy, which opens the door to new uses, such as real time automatic translation, like the Google Lens smartphone application allows. The difference is in the realism of the substitution. The possible applications in terms of augmented reality or on-the-fly personalization of visual content are numerous. For the Facebook teams, this new model, based on self-supervised learning, stands out for its flexibility.
The authors point out the difficulty of the task:
“While most AI systems can do this for well-defined and specialized tasks, building an AI system that is flexible enough to understand the nuances of text in real-world scenes and handwriting is a much more difficult AI challenge.”
The model works with both handwritten and typographic scripts, is able to analyze different stylistic subtleties, and takes into account transofrmations and deformations such as rotations and twists. However, the authors note that there is still room for improvement, especially with regard to metallic supports and reflections.
caption id=”attachment_29528″ align=”aligncenter” width=”1920″] Architecture of the model[/caption]
The generator architecture is based on the StyleGAN2 model, but there are two important notions to take into account:
- StyleGAN2 is an unconditional model, which means that it generates images by sampling a random latent vector while researchers must control the output based on two distinct sources proposed by the user: the desired content and text style.
- The representation of text styles involves a combination of global information, e.g., color palette and spatial transformation, with detailed, small-scale information, such as minute variations in individual calligraphy.
This tool opens new perspectives in the creation of photorealistic language translation models, for the implementation of personalized advertisements but also to generate fake news. The authors say they are aware of this fact and explain that one of the objectives of this work is precisely to be one step ahead of the authors of fake news.