Virtual Reality and Light Field Immersive Video Technologies for Real-World Applications
2: Light Field Video Engineering, Laboratory of Image Synthesis and Analysis (LISA), Universite Libre de Bruxelles, Belgium
Virtual reality (VR) refers to technologies that use headsets to generate realistic images, sounds and other sensations that replicate a real-world environment or create an imaginary setting. VR also simulates a user's physical presence in this environment. In virtual reality, six degrees of freedom allows users to not only look around, but also to move around the virtual world and look from above, below or behind objects. To have a true VR experience, the hardware must provide six degrees of freedom, using both orientation tracking (rotational) and positional tracking (translation). This book is addressed to video experts who want to understand the basics of 3D representations and multi-camera video processing to target new immersive media applications. Unlike single camera video coding, future VR technologies address new challenges that arise beyond compression-only, including the pre- and post-processing (depth acquisition and 3D rendering). This book is inspired by the MPEG-1 (immersive media) and JPEG-PLENO (plenoptic media) standardization activities, and offers a glimpse of their underlying technologies.
Inspec keywords: rendering (computer graphics); image capture; stereo image processing; virtual reality; computer graphics
Other keywords: virtual reality; stereo image processing; cameras; solid modelling; video signal processing; light field immersive video technologies; computer graphics; rendering (computer graphics); augmented reality; image capture; ray tracing
Subjects: Education and training; General electrical engineering topics; Virtual reality; Graphics techniques; Computer vision and image processing techniques; General and management topics; User interfaces; Optical, image and video signal processing
- Book DOI: 10.1049/PBPC021E
- Chapter DOI: 10.1049/PBPC021E
- ISBN: 9781785615788
- e-ISBN: 9781785615795
- Page count: 391
- Format: PDF
-
Front Matter
- + Show details - Hide details
-
p.
(1)
-
1 Immersive video introduction
- + Show details - Hide details
-
p.
1
–7
(7)
This book has been written trying to demystify virtual reality and immersive video technology. For sure there exist many books speaking about three-dimensional (3D) graphics on one hand, and 2D video, on the other hand, covering open-source libraries such as OpenGL and OpenCV, but an immersive video is something in-between, also sometimes referred to as 2.5D video.
-
2 Virtual reality
- + Show details - Hide details
-
p.
9
–21
(13)
The story of virtual reality (VR) starts back in in the late-1830s/early-1840s, with stereoscopic and auto stereoscopic photography invented by Lippman in 1908, allowing one to see in 3D without wearing glasses in a sense, the predecessor of what we now call holography. Many devices have since been developed for stereoscopy, even finding their way into consumer photography, cinema theatres and VR goggles, however with some concerns with respect to visual fatigue and cyber-sickness. Further developments, specifically those related to NASA's Apollo space program led to the first 6DoF free navigation VR system. Finally, at the time of writing (mid-2021), MPEG - the worldwide committee standardizing compression technology for more than 30 years, without which digital TV would not exist - is finalizing a new standard for immersive video, called MPEG Immersive Video (MIV) [22]. It allows the user in his/her living room to watch the scene from any chosen viewpoint; not only a viewpoint from which the scene has been captured.
-
3 3D gaming and VR
- + Show details - Hide details
-
p.
23
–40
(18)
3D gaming and virtual reality (VR) use extremely specific 3D graphics technology to project 3D images onto the user's eyes. OpenGL (Open Graphics Library) is one of the most famous 3D technologies. Historically, OpenGL was developed in the 1990s, then taken over by the Khronos Group. It has all the necessary APIs and drivers to work smoothly with all graphics processing units (GPUs) that nowadays reach outstanding performances, supporting real-time 3D applications, from video gaming to VR. In this article OpenGL, which is a kind of level of detail for textures. This creates projections that are much more regular and smoother.
-
4 Camera and projection models
- + Show details - Hide details
-
p.
41
–56
(16)
In this chapter we mathematically describe how a 3D scene is projected to a camera view. This corresponds to the core functionality of OpenGL and should thus be well understood. The relationship between OpenGL and the pinhole camera model will also be studied, showing that the model is only valid in the hyperfocal regime.
-
5 Light equations
- + Show details - Hide details
-
p.
57
–76
(20)
Light equations are really a cornerstone in OpenGL; they give 'life' to the images that are rendered. Without them, one would end in very dark and/or unrealistic looking images. We therefore devote a couple of sections to them, exposing their mathematical equations that must be programmed to get things 'alive'. These equations will then run on each vertex and/or fragment of the scene, putting some load on the rendering pipeline. Even worse, in search of photo-realistic rendering results, one may even have to use raytracing techniques that will basically iterate multiple times on each pixel to find the colour that best approaches reality, easily overloading the rendering pipeline. We hope the interested reader will then become aware of the level of complexity involved in OpenGL, and how this may impact virtual reality applications. Based on these insights, we will gradually move towards other image-based techniques that enable both photo-realistic quality and reasonable processing cost.
-
6 Kinematics
- + Show details - Hide details
-
p.
77
–102
(26)
This chapter is related to all aspects of kinematics in 3D, including rigid body animations (rotations and translations), as well as deformable object simulations and collision detection. Simple animations can be programmed within the vertex shaders, but more complex animations with deformable objects will need dedicated calculations into the physics engine, which may use compute shaders. In case of geometry simplifications, cf. the levels of detail in Chapter 3, dedicated geometry shaders can handle the vertex connectivity constraints. This chapter only provides the core elements involved in kinematics, since this is a vast domain, impossible to cover in one single chapter. We therefore restrict ourselves to a general overview with links to seminal work that the interested reader can consult by him/herself.
-
7 Raytracing
- + Show details - Hide details
-
p.
103
–116
(14)
In this chapter, we have seen the main aspects of the OpenGL pipeline, from the rendering with light equations to the animation and collision detection, the latter not being strictly a default library of OpenGL, but nevertheless being used by many 3D editing software tools, like Blender.
-
8 2D transforms for VR with natural content
- + Show details - Hide details
-
p.
117
–134
(18)
In the study of OpenGL, we have considered the projection of 3D points (and objects) towards the 2D screen. When objects are explicitly modelled in 3D, like synthetic content created in Blender, all 3D information is available to perform a 2D projection that will display the corresponding image. It often happens, however, that only 2D projections of the real world are available, while requiring image transformations that mimic a 3D effect. For this purpose, we first review the so-called homography, a 2D transformation that gives useful 3D perspective illusions. We will take a shortcut to explain intuitively the most important aspects without rigorously developing all mathematical derivations; we will only stress the most important ones that will be used in the remainder of the textbook. Let us therefore start with the affine transform.
-
9 3DoF VR with natural content
- + Show details - Hide details
-
p.
135
–148
(14)
In previous chapters, we have made the first step towards 3DoF VR with natural images, introducing the concept of panoramic stitching from the image processing point of view. In the present chapter we will rather focus on what this implies from the camera positioning point of view.
-
10 VR goggles
- + Show details - Hide details
-
p.
149
–166
(18)
In this chapter, we explained how to make stereoscopic images that will be interpreted by the visual cortex as a 3D scene.
-
11 6DoF navigation
- + Show details - Hide details
-
p.
167
–204
(38)
A 360 panoramic view created by stitching various camera views together provides a 3 degrees of freedom (3DoF) experience, that is one can look in all directions, for example, in watching live concerts on one's TV screen from home. However, the price to pay is that the viewer is always standing exactly in the centre of the panoramic view. This lack of navigation freedom through the scene represents a serious impediment on an immersive VR experience because it gives the user the impression that all his/her surroundings - even static objects - are moving with him/her, whenever there is a translation from the skybox or cubemap centre. The only way to overcome the resulting cyber sickness is to create positional awareness with the content rendered in the VR goggles moving in the opposite direction of the user's translational movements. These 6 degrees of freedom (6DoF) capabilities require that the content be not only described by textures projected onto a surrounding sphere or cube, but that also the object shapes (geometry) and positions are well captured. This represents a serious challenge for real content. There are two main solutions to this real content challenge: the point clouds and depth image-based rendering (DIBR), which show many commonalities, yet also different challenges that will be explained step by step in this chapter.
-
12 Towards 6DoF with image-based rendering
- + Show details - Hide details
-
p.
205
–283
(79)
In previous chapter, we have seen how we can perform a 6DoF VR navigation by representing the scene in a point cloud format. In the present chapter, we will rather use an image-based approach, interpolating the images from various camera views into a virtual view that can be presented to the user. Though this process is based on images only, the inclusion of depth images transforms the data into an implicit point cloud (or even mesh) format, very well handled by the 3D graphics pipeline of the Graphical Processing Unit (GPU), cf. Chapter 3.
-
13 Multi-camera acquisition systems
- + Show details - Hide details
-
p.
285
–300
(16)
Inspired by human vision (stereo vision), 3D imaging has found its way in different fields. To capture dynamic 3D content, the data acquisition system must be extended from a single camera to a stereo camera or even a multi-camera system. They can acquire a large amount of 3D scene information with cameras put all around the scene. On the other hand, so-called plenoptic cameras, which are the acquisition counterpart of integral photography displays. In essence, a sheet of microlenses transforms the image of the scene into a multitude of tiny images that are captured by the camera sensor. Post-processing software can then fuse all these tiny images into high-resolution images with parallax and all kinds of special effects like refocusing on any portion of the image without having to reshoot the scene multiple times. All this is made possible with so-called plenoptic imaging that we will briefly survey in this chapter.
-
14 3D light field displays
- + Show details - Hide details
-
p.
301
–329
(29)
There are various ways to display the captured three-dimensional (3D) information based on the display devices. Those display devices can be separated into two groups. The first group is the display devices that are used in 2D visual systems such as conventional 2D display or television. The other group includes a special display device designed for 3D visual systems. Among many excellent works, a number of 3D immersive displays and systems are introduced.
-
15 Visual media compression
- + Show details - Hide details
-
p.
331
–366
(36)
The transmission or storage of 3D content is generally similar to the transmission of 2D content in the video-based system. It means that the 3D data can be transmitted through a common communication channel, such as the Internet or wireless channel. However, the 3D information of a scene is usually a lot larger than the 2D information of the same scene. Thus, compression is compulsory. One should be aware that compression is not just a process of binarization. Compression is the process of removing redundancies that can be recovered at the user side, and even discarding some information that has little impact on the visual perception. In the following, we will review different compression technologies based on different representations of the 3D data.
-
16 Conclusion and future perspectives
- + Show details - Hide details
-
p.
367
–368
(2)
The paper discusses the VR/AR/XR technology that provide 6DoF free navigation experiences within a photo-realistic, immersive environment. 3D graphics technology used in 3D video games simulates the physical world with light equations, kinematics, ray tracing and all this represents too high a compute cost for simple immersive applications where one wants to observe the 3D scene from any viewpoint, for instance on a smartphone, without actively interacting with it.
-
Back Matter
- + Show details - Hide details
-
p.
(1)