Unlocking Interaction Capabilities Through Award-Winning GPT-VR Nexus

May 2, 2024

Authored by:

April Horency

The rise of generative artificial intelligence (AI) has spurred exploration into diverse application scenarios. Applications like Chat-GPT showcase this progress, yet the fusion of generative AI and virtual reality (VR) holds promise to extend user interaction beyond the conventional realms of text, vision, and voice. Professor of Electrical and Computer Engineering Tian Lan, along with Jiangong Chen and Bin Li from Pennsylvania State University, explored this nexus by developing a Chat-GPT-powered immersive VR experience, earning them a Best Demo Honorable Mention Award at the 2024 IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR)!

IEEE VR has been the premier international event for presenting finished and ongoing research in the broad areas of virtual, augmented, and mixed reality since 1993. Every year, scientists, engineers, designers, and artists working with VR technology come together at this conference to present innovative research, experience new advances in VR and 3D interfaces, and catch up with the community.

Two pressing problems face the VR community in realizing the vision of combined AI and VR technologies, including:

The absence of tools for both accurate VR context comprehension and for translating AI’s responses into VR scene creation and animated interactions.
The “hallucination” problem that leads to misaligned AI responses in VR context.

To address those challenges and bridge the gap between AI and VR, the research team demonstrated the GPT-VR Nexus, a novel framework for creating a VR experience driven by an underlying generative AI engine. In particular, they employed a two-step prompt strategy and robust post-processing procedures, eliminating the need to fine-tune the complex AI model.

Lan further explained the novelty of their demo, saying, “It enables an immersive experience, such as automated generation of 3D scenes and interaction with VR objects, from user audio inputs/commands.”

Using the team’s technology, users can create scenes and animated interactions directly from ChatGPT’s responses. The two-step strategy to process VR contextual data works by first categorizing the user inputs and then querying relevant data for precise prompts. The additional processing layer is used for response validation and adjustment to ensure the most relevant response is produced. To address the hallucination issue, the post-processing techniques guarantee the objects are generated in order without colliding with one another.

Overall, the GPT-VR nexus’ experimental results showcase responses of the VR environment to a diverse range of user audio inputs in merely a few seconds. This innovation not only tackles pressing challenges within the VR community but also facilitates the creation of immersive experiences more efficiently. Through this project, Lan is helping to unlock the unprecedented interaction capabilities afforded by merging AI and VR technologies.

“ChatGPT is having a transformative impact on many sections. Our work takes an exciting step to make it the ‘brain’ for virtual reality and mixed reality systems. It will provide a truly immersive user experience, and support a wide range of tasks from automated interactions to content generation,” Lan stated.