How the MPEG-5 LCEVC Multilayer Coding Standard Can Enable High-Quality Metaverse/XR Applications

We’ve all heard the hype and excitement around the “Metaverse” and Extended Reality (XR) – but unlike previous “Next Big Things”, they’re actually here to stay in one form or another, bringing major changes to how we work, shop, learn, interact and play online.

First, we need to understand what we mean by metaverse (or meta-universe). John Riccitiello, CEO of Unity Software, recently gave a good definition: “It’s the next generation of the Internet, always real-time, mainly 3D, mainly interactive, mainly social and mainly persistent.”

In practice, it is a new type of Internet user interface for interacting with other parties and accessing data as a seamless and intuitive augmentation with the real world in front of us, inspired by the already familiar online 3D worlds multiplayer video game users. Think of Tom Cruise in Minority report using hand gestures to control and manipulate data transmitted in front of his face; starting today, he would be wearing a headset or XR goggles, or staring at an auto-stereoscopic head-tracking screen.

While gamers have been enjoying 3D worlds with 6 degrees of freedom (6DoF) for over 20 years, there have been – and remain – technical barriers to the mass adoption of immersive XR for non-gaming applications. Most notably, it’s taken a long time for headsets to become fit for purpose in terms of acceptable size and weight, while intuitive control methods need to be able to vary by use – while gesture control is cool , for certain use cases, such as e-commerce or productivity tools, a keyboard, tablet or haptic controller is indeed preferable.

These XR elements continue to be refined and many solutions are already on the market. However, it’s becoming clear that one of the final hurdles to overcome is the massive amount of data and processing required to enable immersive experiences of sufficiently high quality.

Build better experiences
Quality of experience is non-negotiable: end users expect visually stunning, realistic, fluid and immersive experiences. It’s no surprise when video gamers now take the exquisite detail of cinema-like images for granted, even in-game.

While traditional 2D user interfaces can tolerate some imperfections, the purpose of the metaverse is precisely the illusion of “presence”, which requires impeccable audiovisual quality, high quality 3D objects, realistic lighting, high resolutions, frequencies high frame rates and low real-time latency. If users have a bad experience with sketchy graphics and lagging images, they are unlikely to come back for more. Unsurprisingly, there is a strong correlation between experience realism and frequency of use, driving companies to create higher quality experiences to keep users coming back again and again.

Lightweight XR devices don’t and won’t have enough processing power to render experiences realistic enough. Disparities in user devices must also be considered if the metaverse is to be available to a mass audience; the display quality and processing power of a gaming PC compared to a standard laptop are very different, but the quality of experience should be comparable despite the huge volumes of data required.

“Split computing” helps solve this problem, by performing 3D rendering on a separate device, possibly in the cloud, while the display is handled on the XR device. However, the resulting rendered graphics should be streamed to the device as high-resolution, high-frame-rate, ultra-low-latency stereoscopic video. There are many video coding constraints associated with XR streaming, especially when wireless connectivity is involved. Fortunately, there is a standard solution that allows this within realistic limits.

The Benefits of Multi-Layer Coding
Compression and data manipulation suited to the quality, bandwidth, processing, and latency limitations of solid next-generation network connectivity are imperative, with three main technical challenges to overcome:

  • Proper compression and streaming of volumetric objects to the renderer, as well as between renderers, requiring new coding approaches;
  • Ultra-low latency video encoding at 4K 72 fps and beyond, but also at sub-30 Mbps bandwidth to cope with realistic sustained Wi-Fi/5G speeds, and
  • Strong network/CDN/wireless backbone in place between renderer and XR display device, and sufficient cloud renderer resources available.

Layered encoding, i.e. structuring data into hierarchical layers that are accessible and/or decodable as needed, makes sense. After a few false starts, the latest attempts to make layered coding work, in the form of the Low Complexity Enhancement Video Coding (MPEG-5) LCEVC (Low Complexity Enhancement Video Coding) and SMPTE VC-6 standards, enable many of the benefits that layered coding facilitates.

LCEVC is the new ISO-MPEG hybrid multilayer encoding enhancement standard. It is codec independent as it combines a lower resolution base layer encoded with any traditional codec (e.g. h.264, HEVC, VP9, ​​AV1 or VVC) with a residual data layer that reconstructs the full resolution. LCEVC encoding tools are particularly suited to compressing detail efficiently, both from a processing and compression perspective, while using a traditional codec at a lower resolution effectively utilizes the hardware acceleration available to that codec and makes it more efficient.

Stream Enhancement with LCEVC enables UHD stream delivery at the bandwidth typically used for HD video, delivering higher quality video with up to 40% bitrate savings and up to 3x more transcoding fast. It also enables 10-bit over 8-bit HDR codecs such as h.264 or VP9, ​​and offers unique latency jitter reduction thanks to LCEVC’s inherent multi-layered structure. All of these benefits can be achieved while keeping average system latency to a minimum. LCEVC’s low-complexity design also enables higher resolutions with sustained battery consumption, which contributes to durability considerations.

Many tech giants, including Meta, Google, Microsoft, Apple, Sony, and NVIDIA, are investing tens of billions of dollars in Metaverse technology, and industry forecasts predict XR functionality of some sort on the most Internet destinations and applications over the next 5-10 years. . The ability to meet the challenges of data volume will be one of the key factors in how quickly the XR meta-universe comes to fruition; the development and adoption of multi-layered coding standards will dramatically accelerate the development of high-quality, interoperable Metaverse destinations to make them a reality for everyone.

Comments are closed.