NVIDIA announces platform for creating AI avatars

NVIDIA announces platform for creating AI avatars

NVIDIA today announced the launch of NVIDIA Omniverse Avatar, a technology platform for creating interactive artificial intelligence avatars.

Omniverse Avatar is a platform that connects the company's capabilities in speech AI, computer vision, natural language understanding, recommendation engines, and simulation. Avatars developed on the platform are interactive avatars with ray-traced 3D images that can see, speak, converse on a variety of topics, and understand natively articulated intent.

Omniverse Avatar paves the way for the development of easily customised AI helpers for nearly any sector. These technologies could aid in the billions of daily customer service contacts – restaurant orders, banking transactions, scheduling personal meetings and reservations, and more — resulting in more company potential and increased consumer pleasure.

"The dawn of intelligent virtual assistants has come," NVIDIA founder and CEO Jensen Huang stated. "Omniverse Avatar blends the core graphics, simulation, and artificial intelligence technologies from NVIDIA to build some of the most complex real-time applications ever created. Collaborative robots and virtual assistants have tremendous and far-reaching applications."

Omniverse Avatar is a component of NVIDIA OmniverseTM, a platform for virtual world simulation and collaboration in 3D workflows. Huang discussed numerous examples of Omniverse Avatar in his keynote talk at NVIDIA GTC, including Project Tokkio for customer assistance, NVIDIA DRIVE Concierge for always-on, intelligent services in automobiles, and Project Maxine for video conferencing.

Huang demonstrated Project Tokkio for the first time by showing colleagues speaking in real time with an avatar fashioned after a toy replica of himself — discussing subjects such as cellular biology and climate science.

In a second Project Tokkio demonstration, he demonstrated a customer service avatar in a restaurant kiosk that was capable of seeing, conversing with, and understanding two customers as they ordered veggie burgers, fries, and beverages. NVIDIA AI software and Megatron 530B, the world's largest customizable language model, were used in the demonstrations.

In a demonstration of the DRIVE Concierge AI platform, a digital assistant on the centre dashboard screen assists a driver in selecting the optimal driving mode for reaching his destination on time and then follows up on his request to set a reminder when the car's range reaches less than 100 miles.

Separately, Huang demonstrated Project Maxine's capacity to augment virtual collaboration and content production apps with cutting-edge video and audio capabilities. On a video call, an English-language speaker is visible in a noisy cafe yet can be heard clearly without background noise. While she speaks, her words are simultaneously transcribed and translated into German, French, and Spanish using her native voice and inflection.

Key Elements of the Omniverse Avatar

Omniverse Avatar incorporates components of artificial intelligence in voice, computer vision, natural language processing, recommendation engines, face animation, and visuals using the following technologies:

Its speech recognition is based on NVIDIA Riva, a software development kit for multilingual speech recognition. Riva's text-to-voice skills are also employed to generate human-like speech answers.

Its natural language comprehension is based on the Megatron 530B large language model, which is capable of recognising, comprehending, and producing human language. Megatron 530B is a pretrained model that can complete sentences with little or no training, answer questions across a broad scope of subjects, summarise lengthy, complex stories, translate to foreign languages, and manage a variety of domains for which it was not specifically trained.

Its recommendation engine is powered by NVIDIA MerlinTM, a framework that enables enterprises to create deep learning recommender systems capable of handling vast volumes of data and making more intelligent recommendations.

NVIDIA Metropolis, a computer vision platform for video analytics, enables its perceptive capabilities.

Its avatar animation is enabled by NVIDIA Video2Face and Audio2FaceTM, two 2D and 3D facial animation and rendering technologies powered by artificial intelligence.

These technologies are used to create an application that is processed in real time using the NVIDIA Unified Compute Framework. NVIDIA Fleet CommandTM can securely deploy, manage, and orchestrate the talents across numerous locations since they are packaged as scalable, customised microservices.