NVIDIA Unveils Cutting-Edge Visual Generative AI Research at CVPR 2024

NVIDIA Research will present more than fifty papers at the Computer Vision and Pattern Recognition conference (CVPR), which takes place in Seattle from June 17-21 2024. These papers will highlight significant advances in visual generative AI. According to NVIDIA blog, the research has potential applications in creative industries, autonomous vehicles development, healthcare and robotics.

Generative AI: Diverse Applications

Two papers on training dynamics for diffusion models and high definition maps for autonomous cars are among the projects that deserve special mention. They are finalist for the CVPR Best Paper Awards. NVIDIA won the CVPR Autonomous Grand Challenge’s End-to–End Driving At Scale Track, which showcased comprehensive self-driving systems that outperformed more than 450 global entries, earning the CVPR Innovation Award.

NVIDIA has developed a text to image model that is easily customizable, a model for estimating the pose of an object, techniques for editing neural radiance (NeRFs), as well as a model for visual language capable of understanding memes. These innovations are designed to empower creators and accelerate robot training. They also aim to assist healthcare professionals with processing radiology reports.

Jan Kautz is vice president of NVIDIA's learning and perception research. He said that generative AI, in particular, represents an important technological advance. NVIDIA Research shares at CVPR how it is pushing the limits of what is possible, from powerful image-generation models that can supercharge professional creators up to autonomous driving software which could enable next-generation driverless cars.

JeDi: Custom Image Creation Made Simple

JeDi is one of the papers that stands out. It proposes a technique to allow users to customize diffusion model outputs by using reference images in seconds. This method beats existing methods for fine-tuning. This innovation was developed in collaboration between Johns Hopkins University and Toyota Technological Institute at Chicago. It could be useful for creators who need specific character or product depictions.

READ The Onyx hack led to a massive withdrawal of XCN Tokens

FoundationPose & NeRFDeformer

FoundationPose is another highlight of research. It's a foundational model for tracking and estimating object poses. This model can be used to track 3D objects across videos even under difficult conditions, by using 3D images or reference images. This model can enhance industrial applications as well as augmented reality.

NeRFDeformer is a tool developed in collaboration with the University of Illinois Urbana-Champaign that simplifies the transformation of NeRFs using a single RGB-D picture, allowing for a faster update of 3D scenes captured on 2D images.

VILA: Advancing Visual Language Models

NVIDIA, in collaboration with Massachusetts Institute of Technology (MIT), introduced VILA. This family of visual language model outperforms previous models when answering questions about pictures. VILA's pretraining enhances world-knowledge, in context learning, and reasoning over multiple images. It is a powerful tool that can be used for a variety of applications.

Generative AI for Autonomous Driving, Smart Cities and Intelligent Transportation

NVIDIA contributed 12 papers to CVPR that focused on autonomous vehicle research. NVIDIA also provided the largest indoor synthetic dataset ever to the AI City Challenge. This data will help develop smart city solutions as well as industrial automation. These datasets were created using NVIDIA Omniverse. This platform allows developers to create Universal Scene Description (OpenUSD-based) applications and workflows.

NVIDIA Research continues to push boundaries in AI, computer vision, self driving cars, and robotics with the help of hundreds of scientists and engineers around the world. NVIDIA's blog has more information about the groundbreaking work they did at CVPR 2024.

Image source: Shutterstock

READ USA and Nigeria discuss digital economy and AI advancements for economic growth