AV1 Encoding: Opening a new era of open source video encoding

Tencent MPS-Dev Team

Jul 5, 2024

With the rapid development of science and technology, people's ways of obtaining information are becoming more and more diversified, and the rapid development of the multimedia field is inseparable from video coding.

Efficient video codecs are essential for the transmission or storage of multimedia content. In the past decade, the development of video compression algorithms has led to the birth of new video codec standards and reached several milestones. The most famous of them are H.264 Advanced Video Coding (AVC), H.265 High Efficiency Video Coding (HEVC), VP9, AV1 (AOMedia Video 1), and H.266 Versatile Video Coding (VVC). Today we will introduce AV1.

The Origin of AV1

The origins of AV1 can be traced back to 2015, when the Alliance for Open Media (AOMedia) was established. The alliance consists of multiple companies, including Google, Mozilla, Cisco, Microsoft, Netflix, etc., aiming to develop an open, royalty-free video coding technology.

AV1 is the first video compression format developed by AOMedia. Its goal is to provide compression efficiency comparable to the mainstream codecs at the time while avoiding high patent fees.

Google acquired On2 Technologies in 2010 and open-sourced its VP8 codec, which formed the basis of the WebM project. Google subsequently developed the VP9 codec, which was adopted by YouTube for video streaming in 2013. The success of VP9 prompted Google and other companies to collaborate on the next-generation codec, AV1.

The development of AV1 began in early 2016 and was finally completed in mid-2018. It not only inherits the technology of VP9, but also integrates some technologies from Cisco's Thor project and Mozilla's Daala project. AV1 is designed with hardware feasibility in mind, aiming to achieve high compression efficiency and low decoding complexity.

AV1 is named after "AOMedia Video 1". It is designed as the direct successor of VP9 and has a compression efficiency that is about 30% higher than VP9. The launch of AV1 marks the arrival of a new era of open source and royalty-free video coding, which has shown broad application potential in many fields such as video on demand, real-time communication, and streaming services.

VP9 vs AV1

Now, we will formally introduce the definition of AV1. AV1, the full name of which is AOMedia Video 1, is an open source, royalty-free video coding format developed by the Alliance for Open Media (AOMedia). The design goal of AV1 is to achieve higher compression efficiency than existing codecs while ensuring decoding complexity and practical feasibility of hardware.

AV1 Coding Technology

Coding block division

VP9 uses a four-way partition tree starting from the 64×64 level to the 4×4 level, with some additional restrictions for blocks below 8×8, where all sub-blocks within an 8x8 block should have the same reference frame, as shown in the upper half of Figure 1, to ensure that chroma components can be processed in the smallest 4×4 block unit. Note that the partitions designated as "R" refer to recursive partitioning, as the same partition tree is repeated at a lower scale until the lowest 4×4 level is reached.

AV1 vs VP9 coding block division

AV1 increases the maximum coding block unit to 128×128 and expands the partition tree to support 10 possible results, further including 4:1/1:4 rectangular coding block sizes. Similar to VP9, only further subdivision of square blocks is allowed. In addition, AV1 allows each unit to have its own inter/intra mode and reference frame selection, adding more flexibility to sub-8×8 coding blocks. To support this flexibility, it allows 2×2 inter prediction for chroma components while retaining the minimum transform size of 4×4.

Intra prediction

In VP9, intra prediction applies 10 IPMs, including 8 directional modes and 2 non-directional modes, namely DC and True Motion (TM) mode. In AV1, the directional modes in VP9 are called nominal modes, and one of the 7 increment angles with indices ranging from -3 to +3 is further signaled for each nominal mode to increase the granularity of the direction, and a total of 56 directional modes are defined. The following figure defines the directional intra prediction modes of AV1.

Inter-frame prediction

Motion compensation is an important module in video coding. In VP9, up to two references out of up to three candidate reference frames are allowed, and then the predictor either operates block-based translational motion compensation or averages two such predictions when two references are signaled. AV1 has a more powerful inter-frame encoder, greatly expands the reference frame and motion vector pool, breaks the limitation of block-based translational prediction, and also enhances composite prediction by using highly adaptive weighting algorithms and sources.

AV1 expands the number of references per frame from 3 to 7, and in addition to VP9's LAST (nearest past) frame, GOLDEN (distant past) frame, and ALTREF (temporally filtered future) frame, it also adds two near-past frames (LAST2 and LAST3) and two future frames (BWDREF and ALTREF2).

The figure below demonstrates the multi-layer structure of the golden frame group, where an adaptive number of frames share the same GOLDEN and ALTREF frames. BWDREF is a directly coded look-ahead frame with no temporal filtering applied, making it more suitable as a relatively short-distance backward reference. ALTREF2 acts as an intermediate filtered future reference between GOLDEN and ALTREF. All new references can be selected by a single prediction mode or combined into a pair to form a composite mode. AV1 provides a rich set of reference frame pairs, providing both bidirectional composite prediction and unidirectional composite prediction, allowing a variety of videos with dynamic temporal correlation characteristics to be encoded in a more adaptive and optimized way.

Transform Coding

Instead of enforcing fixed transform unit sizes as in VP9, AV1 allows partitioning of luma inter coded blocks into transform units of multiple sizes that can be represented by recursive partitioning down to at most two levels. To incorporate AV1's extended coded block partitioning, square, 2:1/1:2, and 4:1/1:4 transform sizes from 4×4 to 64×64 are supported. For chroma blocks, only the largest possible transform unit is allowed. In addition to this, a richer set of transform kernels is defined in AV1 for both intra and inter blocks.

Entropy Coding

In the AV1 video coding standard, entropy coding is a key component, which is responsible for compressing and transmitting the transformed coefficient matrix, which usually accounts for a large part of the bitstream, sometimes even more than 50%. AV1 uses multi-symbol arithmetic coding technology, which can provide high throughput and has the characteristics of fast probability model adaptation.

Multi-symbol entropy coding: VP9 uses a tree-based Boolean non-adaptive binary arithmetic coder to encode all syntax elements. AV1 uses a symbol-to-symbol adaptive multi-symbol arithmetic coder instead. This design can more accurately track the probabilities of less common elements in the alphabet, and for the encoding of symbol bit rates, this design actually improves throughput.

In addition, entropy coding also includes context adaptation, layered coding, and level map coefficient coding.

Loop Filtering Tools and Post-Processing

AV1 allows multiple loop filtering tools to be applied consecutively to a decoded frame. The first stage is a deblocking filter, which is roughly the same as used in VP9, with slight changes. The longest filter is reduced from 15 taps in VP9 to 13 taps. In addition, there is now more flexibility to send separate filter levels horizontally and vertically for luma and each chroma plane, as well as the ability to change levels between superblocks. Other filtering tools are CDEF (Constrained Directional Enhancement Filter), Loop Restoration Filter, and Frame Super Resolution.

Tiling and multithreading

AV1 supports independent tiles consisting of multiple superblocks, and tiles can be encoded and decoded in any order. Tiles can be uniform (i.e. tiles have the same size) or non-uniform (i.e. tiles can have different sizes), as defined by encoding parameters. Independent tile support provides encoding flexibility so that the encoder and decoder can process tiles in parallel, resulting in faster speed.

In the libaom codebase, both the encoder and decoder implement multithreading (MT), i.e. block row level, tile level, and frame level parallelism. While tiles are allowed, tile-based MT can significantly improve speed. While tiles are not used or used very little, row-based MT allows threads to encode and decode individual superblock rows and further improve speed.

AV1 vs Other Codecs

	H.264/AVC	H.265/HEVC	VP9	AV1
Notable Features	Low latency, suitable for real-time applications	High compression efficiency, balanced time and quality	Royalty-free, suitable for online viewing	Royalty-free, high compression efficiency, comparable to H.265
Compression Efficiency	Good	Better than H.264, but more complex	High, comparable to H.265	Better than H.264, and equal to or better than H.265
Encoding/Decoding Complexity	Low	High	Medium	High
Hardware Support	Mature	Better, but not as good as H.264	Gradually increase	Increasing, emerging
Open Source/Royalty Free	Not open source, may involve royalties	Not open source, may involve royalties	Open source, royalty-free	Open source, royalty-free
Suggested Application Areas	Live capture and streaming	Scenarios that require a balance between time and quality	Online viewing, YouTube usage	Streaming, video on demand (VOD), etc.
Future Development	Mature technology, widely used	Gradually popularized, high efficiency	Stable application, continuous optimization	Rapid development, expected to be widely used

Application of AV1

AV1 has been applied to streaming services, real-time communication and 8K ultra-high-definition video and other application scenarios:

Streaming services: AV1 is used to provide high-quality video streaming. For example, platforms such as YouTube and Netflix have begun to support AV1-encoded videos.

Real-time communication (RTC): AV1's encoding tools are particularly suitable for processing screen content and camera content, and are suitable for RTC applications such as video conferencing and online education.

Cloud transcoding services: Cloud service providers such as AWS, Google Cloud, and Tencent MPS already support AV1 encoding for video transcoding services.

8K ultra-high-definition video: As 8K video content gradually increases, AV1 is used to encode these high-resolution videos, providing better compression efficiency and image quality.

Open source community and software development: Since AV1 is open source, many open source projects and communities are developing and optimizing AV1 codecs, such as libaom, dav1d, etc.

Conclusion

By supporting AV1 codec technology, Tencent MPS provides users with an efficient and cost-effective video processing solution that not only optimizes video storage and transmission efficiency, but also ensures wide compatibility and high-quality playback experience of video content on different devices and platforms.

You are welcome to Contact Us for more information.