← Text-to-video creation by Moonvalley Seamless Translation AI models →

Generative video background synthesis

by Fede Nolasco | Apr 30, 2024

Generative video background synthesis is revolutionized by ActAnywhere, a cutting-edge tool designed to automate the creation of video backgrounds that match the motion and appearance of the foreground subject. This innovation significantly reduces the manual effort traditionally required in the movie industry and visual effects community. ActAnywhere utilizes large-scale video diffusion models, tailored specifically for generating realistic interactions between the foreground and the background while adhering to the artist’s vision ².

The model operates by taking a sequence of foreground subject segmentation and an image depicting the desired scene, producing a coherent video that respects the condition frame ¹. Trained on a vast dataset of human-scene interaction videos, ActAnywhere has demonstrated superior performance over existing baselines through extensive evaluations. Its ability to generalize to a wide range of out-of-distribution samples, including non-human subjects, further underscores its versatility.

ActAnywhere’s 3D U-Net architecture is conditioned on a frame describing the background, using masks and a sequence of foreground subject segmentation as input. During training, a randomly sampled frame from the training video conditions the denoising process. At testing, the condition can be a composited frame with a novel background or a background-only image.

 Boxiao Pan, Zhan Xu, Chun-Hao Paul Huang, Krishna Kumar Singh, Yang Zhou, Leonidas J. Guibas, Jimei Yang

 Not Applicable

 April 30, 2024

 https://arxiv.org/abs/2401.10822

 ActAnywhere: Subject-Aware Video Background Generation

← Text-to-video creation by Moonvalley Seamless Translation AI models →