A New Artificial Intelligence (AI) Research From The University of Maryland Propose A Shape-Aware Text-Driven Layered Video Editing Tool


Video editing, the process of manipulating and rearranging video clips to meet desired objectives, has been revolutionized by the integration of strained intelligence (AI) in computer science. AI-powered video editing tools indulge for faster and increasingly efficient post-production processes. With the urging of deep learning algorithms, AI can now automatically perform tasks such as verisimilitude correction, object tracking, and plane content creation. By analyzing patterns in the video data, AI can suggest edits and transitions that would enhance the overall squint and finger of the final product. Additionally, AI-based tools can squire in organizing and categorizing large video libraries, making it easier for editors to find the footage they need. The use of AI in video editing has the potential to significantly reduce the time and effort required to produce high-quality video content while moreover enabling new creative possibilities.
The use of GANs in text-guided image synthesis and manipulation has seen significant advancements in recent years. Text-to-image generation models such as DALL-E and recent methods using pre-trained CLIP embedding have demonstrated success. Wastage models, such as Stable Diffusion, have moreover shown success in text-guided image generation and editing, leading to various creative applications. However, for video editing, increasingly than spatial fidelity is required, and that is temporal consistency.
The work presented in this vendible extends the semantic image editing capabilities of the state-of-the-art text-to-image model Stable Wastage to resulting video editing.
The pipeline for the proposed tracery is depicted below.
Given an input video and a text prompt, the proposed shape-aware video editing method produces a resulting video with visitation and shape changes while preserving the motion in the input video. To obtain temporal consistency, the tideway uses a pre-trained NLA (Non-Linear Atlas) to decompose the input video into the preliminaries (BG) and foreground (FG) unified atlases with associated per-frame UV mapping. After the video has been decomposed, a single keyframe in the video is manipulated using a text-to-image wastage model (Stable Diffusion). The model venal this edited keyframe to estimate the dumbo semantic correspondence between the input and edited keyframes, which allows for performing shape deformation. This step is very delicate, as it produces the shape deformation vector unromantic to the target image to maintain temporal consistency. This shape deformation serves as the understructure for per-frame deformation since the UV mapping and atlas are used to socialize the edits with each frame. Furthermore, a pre-trained wastage model is venal to ensure the output video is seamless and without unseen pixels.
According to the authors, the proposed tideway results in a reliable video editing tool that provides the desired visitation and resulting shape editing. The icon unelevated offers a comparison between the proposed framework and state-of-the-art approaches.
This was the summary of a novel AI tool for well-judged and resulting shape-aware text-driven video editing.
If you are interested or want to learn increasingly well-nigh this framework, you can find a link to the paper and the project page.
Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 14k ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, tomfool AI projects, and more.
Do You Know Marktechpost has 1.8 Million Pageviews per month and 500,000 AI Community members? |
Want to support us? Become Our Sponsor |
The post A New Strained Intelligence (AI) Research From The University of Maryland Propose A Shape-Aware Text-Driven Layered Video Editing Tool appeared first on MarkTechPost.