Table of Content
Consumers are getting their hands on Aomedia Video 1, or AV1. Netflix made headlines early in 2020 when it announced the service would be streaming AV1 on Android. Google later made the AV1 codec part of its Duo video chat app. MediaTek made it possible to stream YouTube videos using the AV1 codec on the Dimensity 1000 5G SoC. So, what's the big deal? Why is it so popular? And why is it important? An overview of the future of video streaming with AV1 over the next five years.
In its original form, AOMedia Video 1 (AV1) was designed for video transmissions over the internet. It is an open, royalty-free video coding format. The Alliance for Open Media (AOMedia) developed it as the successor to VP9. AOMedia is a consortium led by semiconductor firms, video-on-demand providers, video content producers, software developers, and browser vendors. There is a reference video codec as part of the AV1 bitstream specification. Facebook conducted tests in 2018 that simulated real-world situations and found that the AV1 reference encoder achieved 34% higher compression than libvpx-vp9, 46.2% higher compression than x264 High profile, and 50% higher compression than x264 Main profile. Like VP9, AV1 has a royalty-free licensing model that does not hinder its adoption in open-source projects.
You may remember this chapter from AVIF.
Suppose an algorithm's inventor sells the technique to a third party. In that case, one of the business options is to charge a small fee, a royalty, for every device that ships with it. Suppose an algorithm isn't a digital download. In that case, it will be valued by product makers who want to incorporate the algorithm into smartphones, tablets, laptops, TVs, etc. It seems reasonable.
Nevertheless, the system is open to abuse. In a world where unfriendly renegotiations and patent trolls have plagued royalty-based businesses and mammoth lawsuits, the history of royalty-based companies has been filled with unexpected wins and losses. As soon as a product is conceived, royalty fees are already looming. The opposition to this is to look for and develop technology that is free from royalty payments, unencumbered by patents, just as you would not want to charge a product maker for building a gadget with electricity.
AV1 is an attempt to accomplish that. Current video streaming technologies are not all royalty-free. Several streaming services and discs, such as DVDs, satellite and broadcast TV, and 8K TV, are cluttered with royalty claims and patents for MPEG-2 video, H.264/AVC, and H.265/HEVC. Fees may sometimes be waived or may not be waived. The Panasonic H.264 patent portfolio has over 1,000 patents, and the Samsung H.265 patent portfolio has over 4,000 patents. AV1 codecs are designed to be royalty-free. There are a lot of big brands that support it, making it impossible to take on the combined clout of Google, Adobe, Microsoft, Facebook, Netflix, Amazon, and Cisco in a legal battle. Despite this, trolls like Sisvel are still rattling their chains.
Like AVIF has better compression compared to WebP, AV1 has better compression than H.265.
Beyond being royalty-free and open-source friendly, AV1 must also offer advantages over existing technologies. It is claimed that AV1 provides 30% better compression than H.265. Thus, 4K UHD video uses less data with the same quality.
Any video codec must meet two essential metrics: bitrate (i.e., file size) and quality. Larger encoded files are associated with higher bitrates, and Encoded files of larger sizes require more data to be streamed. Quality will change as the bitrate changes. Having fewer data will reduce the accuracy and fidelity of the source materials. The more data there is, the better the chance of representing the original.
H.264/H.265 and AV1 are lossy video codecs. Therefore, the encoded version (pixel by pixel) is different from the original. To make these losses undetectable to the human eye, the video must be encoded so that the human eye cannot detect them. To accomplish this, there are several techniques. Here are the advantages of incremental frame changes, quantization, and motion vectors rather than squeezing 30 frames into a second (for a 30fps video). The ball and the people change in the scene if two people throw a ball, and everything else will remain pretty static. There are fewer data sets to worry about for the video encoder, just the difference. The scene must be updated whenever it changes or at forced intervals by including a full-frame (a keyframe) and then tracking the difference between that last full-frame.
The photos you take on your smartphone are likely to be saved in JPEG format (a .jpg file). It is a lossy compression format. The technique it uses is quantization. The basic idea is that an 8x8 pixel segment on a photo can be modeled as a series of shades (one for each channel of color) layered on top of each other. The patterns are generated using a Discrete Cosine Transform (DCT). It is possible to approximate an 8*8 block using 64 patterns by deciding the amount of each pattern needed to get an approximation of the original. It turns out that maybe only 20% of the patterns are necessary to reproduce the original block convincingly. On the other hand, Lossy compression only needs 12 bits per pixel rather than 64. 64 to 12 for each color channel is quite a savings.
Shaded patterns are variable in number and size, as are the transformations required to generate them. The weighting is given to each pattern and the amount of rounding. The rules for JPEG are one set, that for H.264 are another, that for AV1 are another, and so on. The idea is the same in all cases. The result is that each frame in the video is a lossy representation of the original frame. Smaller and compressed compared to the original.
Motion tracking is the third feature. The ball travels around the scene in our scene of two people tossing a ball. In some cases, it will have traveled the same way, so rather than sending the same data again, it would be more helpful to note that the block containing the ball has moved a bit. It can take some time to find motion vectors and plot tracks, but not when decoding them.
The key to keeping the bitrate low and the quality high when it comes to video encoding. As video encoding technology developed over the years, the aim was to reduce the bit rate while maintaining quality. Furthermore, the resolutions of displays available to consumers have also increased. Video streaming services have advanced to 4K and 8K from yesterday's 480p (NTSC) DVD and 1080p Blu-ray.
A high screen resolution also means more pixels to represent, so more data is needed for each frame. These are rough estimates of the compression ratios since they imply a constant bit rate. In some codecs, videos can be encoded in a variable bitrate based on quality. Thus, the bitrate changes moment by moment, depending on the scene's complexity and the background's clutteredness. As a result, this quality setting determines the overall bitrate. There are different ways to measure quality.
A peak signal-to-noise ratio can be viewed, as well as other statistics. Also, consider the perception quality. The 30% better compression claims come from the fact that when 20 people watch the same video clips, which encoder will be rated higher for quality. The quality of an AV1 video stream is the same while using a lower bitrate (by 30%). This subjective opinion is hard to verify and equally hard to dispute. And is scalable to any modern device at any bandwidth and is designed with a low computational footprint optimized for the internet and hardware. Video delivered by AV1 is consistent, real-time fast, and of the highest quality and can be used for commercial or non-commercial content, including user-generated content.
MPEG-designed HEVC, which is expected to succeed AVC, entailed a high cost and uncertainty for the Alliance, which prompted them to create AV1. Moreover, the Alliance's seven founding members - Amazon, Cisco, Google, Intel, Microsoft, Mozilla, and Netflix - said that the format's initial focus would be to deliver high-quality web video. In September 2015, the Alliance for Open Media announced the formation of AV1. The initial licensing offer of HEVC Advance was announced 42 days prior, on July 21, 2015, to be a substantial increase over that of its predecessor, AVC. HEVC also increased the licensing process's complexity and increased the cost. Two patent pools were formed after the HEVC standard was completed, with a third one on the horizon. It was different from previous MPEG standards where the technology could be licensed from one entity, MPEG-LA. In addition, other patent holders refused to license through either pool, increasing the uncertainty surrounding HEVC's licensing.
- The lack of certainty around licensing is believed to be one of the main reasons for the creation of AV1. Microsoft's Ian LeGrow promoted the idea of open-source, royalty-free technology. Patent licensing has also been named another reason for the creation of AV1. By adding an H.264 implementation to Firefox, it would be unable to be distributed free of charge since MPEG-LA licensing fees would have to be paid. Due to various incompatibilities between FRAND patent licensing practices and free software licenses, Free Software Foundation Europe has argued that implementing free software standards is impossible. Many of the components in the AV1 project originated from previous Alliance research projects. Several contributors had already developed experimental technology platforms:
- Xiph (Mozilla) published Daala in 2010.
- Google announced its VP9 evolution project VP10 on September 12, 2014.
- Cisco published Thor on August 11, 2015.
- As part of AV1, new techniques are incorporated, several of which were developed for these experimental formats.
The first version of AV1 was released on April 7, 2016. Even with a soft feature freeze in place at the end of October, several significant features continued to be developed. It had been planned to freeze the bitstream format in January 2018. Still, it was delayed because of unresolved critical bugs and further modifications in transformations, syntax, the prediction of motion vectors, and the completion of legal analysis. The Alliance released an encoder and decoder based on software on March 28, 2018. Spec 1.0.0 was validated and released on June 25, 2018. After the bitstream format freeze was completed in January, AOM member Bitmovin's Martin Smole said the most significant remaining challenge would be ensuring that the reference encoder is computationally efficient. Encoders were still being developed but not targeted for production use, nor were speed optimizations prioritized. Because of this, the early versions of AV1 were orders of magnitude slower than existing HEVC encoders. The focus of the development effort shifted to developing the reference encoder.
On January 21, 2021, the MIME type of AV1 was defined as video/AV1. The derivation of the reference encoder speed was reported to have improved dramatically in March 2019. AV1 will only be used for Real-time Transport Protocol purposes using this MIME type. In April 2021, Roku removed Youtube TV from the Roku platform due to an expired contract. Several Roku streaming devices do not support the AV1 codec. YouTube and Roku have signed a multi-year deal to keep their apps on the Roku streaming platform. According to Roku, consumers would be forced to pay more for devices that use the royalty-free AV1 codec.
Several companies, including Amazon, Cisco, Google, Intel, Microsoft, Mozilla, and Netflix, announced their Alliance for Open Media participation on September 1, 2015. A group of three open-source codec projects was consolidated at the time: Cisco's Thor, Google's VP10, and Mozilla's Daala. As stated in the initial press release, the goal was to develop a "next-generation video format," which is:
- Interoperable and open
- Optimized for the web
- Scalable to any modern device at any bandwidth
- Designed with a low computational footprint and optimized for hardware
- Capable of consistent, highest-quality, real-time video delivery
- Flexible for both commercial and non-commercial content, including user-generated content.
Licensors receive a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except to the extent expressly stated in this license) patent license to use, make, sell, offer for sale, import, and distribute any Implementation. Licensees are not required to disclose any of their source code. On April 5, 2016, it was announced that IP provider ARM, chipmaker AMD, and semiconductor developer NVIDIA joined the Alliance to help ensure that the codec is hardware friendly and to facilitate and accelerate the AV1 hardware support. The Alliance members enjoy leading positions in the following markets:
- Codec development - Cisco (Thor), Google (VPX), Mozilla (Daala)
- Desktop and mobile browsers - Google (Chrome), Mozilla (Firefox), Microsoft (Edge)
- Content - Amazon (Prime), Google (YouTube), Netflix
- Hardware co-processing - AMD (CPUs, graphics), ARM (SoCs, other chips), Intel (CPUs), NVIDIA (SoC, GPUs)
- Mobile - Google (Android), Microsoft (Windows Phone)
- OTT - Amazon (Amazon Fire TV), Google (Chromecast, Android TV)
As a result of these positions, the Alliance should be able to quickly integrate AV1 into the members' products and services and influence industry acceptance. For example, Netflix and YouTube deploying AV1-content to browser viewers will influence device manufacturers to support the format as well, which should be simplified by the availability of AV1-compatible hardware from other Alliance members.
These positions should enable the Alliance to ensure the fastest possible integration of AV1 into the members' products and services and influence industry acceptance as a whole. For example, once Netflix and YouTube start deploying AV1-content to browser-based viewers, this should affect the manufacturers of Smart TVs, set-top boxes, and competitive OTT devices to support the format, which will be simplified by the availability of AV1-compatible hardware from other Alliance members.
The AV1 format aims to be an advanced and royalty-free video format for the web. Likewise, Matt Frost, head of strategy and partnerships in Google's Chrome Media team, said that the Alliance for Open Media has the same mission as WebM. One of the most common concerns in standards development, particularly royalty-free multimedia formats, is the danger of accidentally infringing on patents that their creators and users were unaware of. A similar concern has been raised regarding AV1 and VP8, VP9, Theora, and IVC. While the issue is not unique to royalty-free formats, it uniquely threatens their status as royalty-free. To maintain this status, the development process requires features to be independently examined by two independent parties to determine whether they do not infringe on patents of competing companies.
Owners of patents similar to patent-protected techniques are invited to join the Alliance when there is no alternative (even if they already belong to another patent pool). For example, Apple, Cisco, Google, and Microsoft are also members of MPEG-LA's patent pool for H.264. AV1 licensees also benefit from a legal defense fund to support them if they are sued for alleged patent infringement. Under AV1's patent rules, technology contributors license their AV1-connected patents to anyone, anywhere, anytime.
Provided the user doesn't file a lawsuit). Anyone who engages in patent litigation loses the right to the patents of all patent holders. This treatment of intellectual property rights (IPR) during development and its absolute priority remains incompatible with MPEG formats like AVC and HEVC. As stated in the ITU-T's definition of an open standard, these standards were developed under an IPR uninvolvement policy in their standardization organizations. However, MPEG's chairman argues that this practice must change.
EVC will have a royalty-free subset and switchable features in its bitstream to guard against future IPR threats. The industry has long been committed to creating royalty-free web standards. HTML5 video proposal in 2007 called for Theora implementation to be mandatory. Google continued its royalty-free competition with the WebM project, which renewed the free-to-use competition after AVC surpassed Theora, and the Alliance for Open Media was a continuation of its efforts. Mozilla, a company that distributes free software, may find AVC challenging to support. A per-copy royalty is not sustainable due to the absence of revenue streams. (See FRAND § Excluding costless distribution). As with HEVC, no exception has been allowed for free software (see HEVC * Provision for costless software). The performance goals include "a step up from VP9 and HEVC" inefficiency for a minimal increase in complexity. NETVC aims to achieve a 25% improvement in efficiency over HEVC. As hardware support will not be available immediately, software decoding will be the primary concern.
AV1 is designed for real-time applications and higher resolutions than typical usage scenarios of the current generation of video formats, where it is expected to achieve the most excellent efficiency. Cisco's Thor contribution targets "reasonable compression while requiring only moderate complexity." Therefore, it will support the color space from Recommendation BT.2020 of the ITU-R organization and 12 bits of precision per color component. The AV1 codec is primarily designed for lossy encoding, and however lossless compression is also supported.
AVIF is another purpose that came out of the work from AV1.
AV1 is well-positioned to compete directly with HEVC in most streaming-related markets, including browser-based streaming, mobile, OTT, and smart TVs, which are the principal markets served by Alliance members. As for positioning, it won't be known until the codec is released, tested, and comparative quality requirements can be determined. AV1's most significant competitive advantage over HEVC is royalty-free. In contrast, HEVC is tethered to royalty payments under MPEG LA and HEVC Advance patent pools. HEVC-encoded videos also receive content royalties from the HEVC Advance pool if they are distributed on physical media or streamed via subscription or pay-per-view.
Technicolor had initially been a member of the HEVC Advance pool but discontinued membership instead of licensing its IP directly to third parties. Many HEVC IP owners are not participating in a licensing collection or have not indicated whether or how they plan on licensing their HEVC IP. As a result of the disparate actions taken by the various HEVC rights holders, licensing HEVC is both costly and complex. If it meets its objectives, the Alliance should have a transparent quality advantage over HEVC. However, this could be negated by higher CPU playback requirements and longer encoding times. The Alliance membership also offers a competitive advantage due to the ability to implement AV1 and influence others to do the same quickly.
The most significant advantage of HEVC is that the technology has hardware support in many key markets, including mobile (hardware support on Apple and Android), smart TVs, set-top boxes, OTT devices, and widespread support among mainstream encoding vendors. While HEVC is not yet available on desktop/notebook browsers, VP9, AV1's predecessor, is available on all recent versions of all browsers except for Apple Safari. It does not appear AV1 will compete directly with Google's VP9, as major users, like Microsoft and Google, have already announced plans to support AV1. As both support VP9, they can distribute to computers and devices that AV1 can't serve for hardware support or other technical constraints.
Traditional frequency transforms used in AV1 using new techniques. AV1 extends Google's VP9, letting encoders perform better adaptation to different input types thanks to additional coding methods. The Alliance published an assembly and C implementation (aomenc, aomdec) under the BSD 2-Clause License. This is a public development process, and all members of AOM are invited to contribute. Coding tools were added to the reference code base as experiments, controlled by flags enabling or disabling them during build time, and reviewed by other AOM members and specialized teams who ensured hardware compatibility and intellectual property rights compliance (TAPAS).
After the feature gained some support in the community, it was enabled by default. Ultimately, its flag was removed when all reviews had passed. In the configure scripts, experiment names were lowercased, but they were uppercased in the conditional compilation flags. The metadata can now be embedded in the video bitstream rather than signaled by the container to handle HDR and color spaces more reliably.
A superblock is a group of adjacent same-sized blocks. Like macroblocks, superblocks are square in shape and have a size of 128×128 or 64×64 pixels. It is possible to divide superblocks into smaller blocks using different partitioning schemes. Four-way splits are the only pattern whose partitions can be subdivided recursively. As a result, superblocks can be split into partitions as small as 4*4 pixels. The layout can be designed in "T-shaped" patterns, a feature developed for VP10, or horizontal or vertical stripes of 4:1 and 1:4 aspect ratios.
Depending on the block size, there are different partitioning patterns available. For example, blocks smaller than 128x128 and 8x8 cannot have 4:1 and 1:4 splits. In addition, 8*8 blocks can't use "T" shapes for splitting. Two separate predictions are now possible using a smooth, oblique transition line (wedge-partitioned predictions). Through a configurable prediction dependency between tile rows (ext_tile), it is possible to separate objects more accurately without the traditional stair-step lines running along the borders of square blocks.
Predictions can be combined in more advanced ways instead of a uniform average (compound prediction) due to the higher precision (10 or 12 bits per sample) of the internal processing. This includes gradients in both smooth and sharp directions (partitioned wedge prediction) and implicit masks derived from the difference between the two predictors. In AV1, multiple inter-predictions can be combined, or two inter-predictions can be combined with an intra-prediction to produce a block. A frame can refer to six of the eight frames instead of three for inter-prediction (inter) and bi-prediction (ext_refs) while reducing redundant motion vector information with Warped Motion (warped_motion) and Global Motion (global_motion). They use techniques that have been explored in previous formats like three-dimensional MPEG-4 ASP.
A-frame may use a set of warping parameters in the bitstream, or a block may use implicit local parameters generated based on surrounding frames. Switch frames (S-frames) are a new inter-frame type predicted from previously-decoded reference frames of a higher-resolution video. Adaptive bitrate streaming uses intra prediction to determine the pixels of given blocks by only using the information associated with the current frame. Using this method, it is possible to switch from higher to lower resolution without creating an entire keyframe. Intra predictions are often made based on pixels to the left and above the prediction block.
A DC predictor uses neighboring pixels as input and extrapolates a prediction based on the angle at which these pixels are positioned. Directional predictors use these adjacent pixels as input. Eight main directional modes can be selected in AV1. The modes start at 45 degrees and increase by 22.5 degrees steps until 203 degrees. The "TrueMotion" predictor has been replaced with a Paeth predictor, which studies the difference between available pixels in the upper left corner and the new pixels and chooses the one that lies in the direction of the smaller gradient (ext_intra). For smaller blocks, six offsets of three degrees can be calculated, three above the main angle and three below it, for 56 different angles (ext_intra). Some computer screens have eight dominant colors, for which a palette predictor is available.
By utilizing samples from the luma plane (cfl), correlations between luminosity and color information can now be exploited. It is possible to reduce visible boundaries along the border of inter-predicted blocks by using a technique called overlapped block motion compensation (OBMC). In this technique, a block's size is extended to coincide with neighboring blocks by 2 to 32 pixels, and the overlapped portions are blended.
AV1 encoders can transform the error remaining after prediction to the frequency domain by utilizing square, 2:1/1:2, 4:1/1:4 rectangular rectilinear DCTs (rect_tx), asymmetric DSTs for blocks where nearby pixels predict the top and left edges to be lower in error, or they can refuse to transform the mistake at all (identity transform). They can also combine two one-dimensional transforms to provide alternative transforms for the horizontal and vertical dimensions (ext_tx).
AOM_QM has been optimized for AV1. There are now individual chroma plane parameters for each set of eight quantization parameters that can be signaled and selected for each frame. To adjust the quantization parameters, the parameter offset can be signaled for every new superblock.
Thor's constrained low-pass filter and Daala's directional enhancement filter are combined in cdef, a constrained directional enhancement filter. A conditional replacement filter that smooths blocks roughly along the dominant edge of the picture to eliminate ringing artifacts. There is also a Wiener filter-based loop restoration filter (loop_restoration) and self-guided restoration filters to correct blur artifacts resulting from block processing.
Film grain synthesis (film_grain) improves the coding of noisy signals via a parametric video coding scheme. Because film grain noise is random, this signal component is typically costly to code or prone to getting damaged or lost, resulting in severe coding artifacts. This tool combines analysis and synthesis to produce a visually similar synthetic texture based solely on subjective visual impressions instead of objective similarity to overcome these problems.
Taking out the grain component from the signal and analyzing its non-random characteristics, it transmits only descriptive parameters to the decoder, which then adds back a synthetic, pseudorandom noise signal shaped like the original. The visual equivalent of Perceptual Noise Substitution is used by AC3, AAC, Vorbis, and Opus audio codecs.
Some AVIF encoders can also use these filters.
In place of VP9's binary entropy coder, Daala's entropy coder (daala_ec) has been selected. In addition to evading patents, non-binary arithmetic coding enhances an otherwise serial process, allowing clock rates on hardware implementations to be reduced. Modern binary arithmetic coding like CABAC is faster because it uses a more excellent alphabet than binary, like Huffman code (although not quite as fast as Huffman code). Also added to AV1 is the ability to adjust the symbol probabilities per coded symbol rather than per frame for the arithmetic coder (EC_adapt).
It is not unique to AV1 that scalable video coding is essential for video conferencing. One or lower bitrate video streams can be extracted from a higher bitrate video stream with better quality by restricting and structuring the video frame dependency. Contrary to adaptive bitrate streaming, some compression efficiency is sacrificed to enhance the overall stream. Additionally, AV1 encoding is less redundant and less demanding. It also allows for spatial and temporal scalability. In other words, both framerate and resolution can be used to define a lower bitrate substream.
On the Sintel and Tears of Steel short films, Bitmovin's objective metrics were superior to HEVC when using the experimental features at the time (of 77 total practical features). The comparison from the beginning of June 2016 found AV1 roughly on par with HEVC. According to Jan Ozer of Streaming Media Magazine, AV1 is at least as good as HEVC now. AV1 appeared to be 65.7% less efficient than HEVC than the Fraunhofer Institute for Telecommunications in late 2016, underperforming even H.264/AVC, which was found to be 10.5% more efficient by the Fraunhofer Institute.
Ozer explained this discrepancy by using encoding parameters endorsed by each encoder vendor and by using more features in the newer AV1 encoder. According to internal measurements from 2017, AV1 decoded at about half the speed of VP9 at 720p. Netflix measured AV1's efficiency against VP9 using PSNR and VMAF at 720p and found AV1 to be about 25% more efficient than VP9 (libvpx). Based on PSNR tests done by Facebook in 2018, the AV1 reference encoder achieved higher levels of data compression than libvpx-vp9, x264 High profile, and x264 Main profile. Tests from Moscow State University in 2017 showed that HEVC and VP9 required more bitrate to achieve similar levels of quality than AV1.
According to the 2020 University of Waterloo data, AV1 had 9,5% bitrate saving compared to HEVC and 16.4% compared to VP9 when using a mean opinion score (MOS) for 2160p (4K) video. As of September 2020, the most recent encoder comparison conducted by Streaming Media Magazine used moderate encoding speeds, VMAF, and a diverse set of short clips, resulting in the AV1 video encoding averaging 590* longer than AVC. HEVC encodes 4.2* longer, and VP9 encodes 5.2* longer than AVC on average. According to this figure, the open-source libaom encoder and SVT-AV1 encoder used 15-20% less bitrate than x265 in its "veryslow" preset, or about 45% less bitrate than x264 veryslow. Its "slower" preset was as fast as x265 veryslow while saving 50% bitrate over x264 veryslow. Visionular's Aurora1 encoder was the best-in-test AV1 encoder.
There are a few differences between AOMedia's AV1 and AVC. MPEG launched AVC in 2003, and AV1 aims to be the most widely used video format on the internet. Videos of high quality should be exchanged across the internet freely and efficiently.
The most significant advantages of AV1
Video streams can be streamed in higher quality and faster with AV1. AV1 is a license-free solution that uses compression technology nearly twice as fast as competitors. Video files are decoded, encoded, and compressed for free, allowing end-users to enjoy a high-quality video even with low Internet bandwidths. AV1's performance targets are ambitious.
The efficiency will be improved by 25 percent compared to HEVC, according to AOMedia. Boosting complexity will still require software decoding since hardware support remains a long way off. In all common web browsers, AV1 can be used along with Opus audio format in WebMcontainer files. The only exception is the Safari browser, which does not support both formats.
Three decoder profiles are defined in AV1, namely Main, High, and Professional. The Main profile supports bit depths of 8- and 10-bits for chroma sampling of 4:0:0 (greyscale) and 4:2:0 (quarter). Additional support for 4:4:4 chroma sampling is also included in the High profile (no subsampling). With the Professional profile, you can take advantage of the full support for 4:0:0, 4:2:0, 4:2:2 (half), and 4:4:4 chroma subsampling with 8, 10, and 12-bit color depths.
For decoders, AV1 defines levels ranging from 2.0 to 6.3. Level implementation depends on the hardware.
Example resolutions would be 426×240@30 fps for level 2.0, 854×480@30 fps for level 3.0, 1920×1080@30 fps for level 4.0, 3840×2160@60 fps for level 5.1, 3840×2160@120 fps for level 5.2, and 7680×4320@120 fps for level 6.2. Level 7 has not been defined yet.
ISO Base Media File Format: AOMedia's ISOBMFF specification was the first to be finalized and the first to gain adoption. YouTube uses it.
MPEG Transport Stream (MPEG TS)
AOMedia defines a preliminary RTP packetization spec that defines AV1 OBUs as the payload for RTP. The extension contains information about video frames and their dependencies, which is helpful for § scalable video coding. Raw video data is also transmitted differently from MPEG TS over RTP since other streams, such as audio, must be carried externally.
The subset of Matroska called WebM has not sanctioned AV1 as of late 2019.
This format was inherited from VP8's first public release as a simple development container. Rav1e supports this format as well.
Libaom initially supported WebM before Matroska containerization was specified; that has since been changed to conform to the Matroska specification.
AV1 codec isn't ready for the masses
Just like AVIF, AV1 files are slow to encode. A 4K clip from my smartphone takes 15 seconds to encode. Encoding it on my PC using only software into H.264 takes around a minute, four times longer than the clip's length. With NVIDIA hardware acceleration, it takes 20 seconds. Because it's just a little longer than the original clip, it's not ready for the masses (yet). We will improve the encoders, and hardware support will follow.
Thanks to its lean and efficient decoders, Netflix has already begun to stream some content to Android devices in AV1. What about replacing H.264 universally? Google's claims regarding AV1 for Duo are fascinating, and they appear to imply both AV1 encoding and decoding on the client devices. When I emailed about this, Google seemed to be planning to tell me some details about the AV1 codec and Duo. Still, then everyone on the mailing list went silent. If anyone from Google contacts me, I will let you know!
AV1 adoption will improve as more chipsets come with hardware-based encoding/decoding support. It is important to remember that AV1 is still quite a new standard. Considering the development started in 2015, we shouldn't expect mass adoption for another 2-3 years. Even low-end and mid-tier chips will support AV1 decoding within the next few years as ARM integrates AV1 specs into its SoC design. Google TV will also begin to support AV1 when ARM integrates the specifications into its SoC design. We're not far off from high-quality, low-bitrate streaming in the future. So, what are your thoughts on AV1?
Netflix announced that they expected to be an early adopter of AV1 in October 2016. Netflix began using AV1 to stream specific titles on Android on February 5, 2020, providing 20% better compression than VP9. On November 9, 2021, Netflix announced that it was now streaming AV1 content to televisions with AV1 decoders and the PlayStation 4 Pro.
YouTube began streaming AV1 content in 2018, starting with its AV1 Beta Launch Playlist. According to the description, the videos are (to begin with) encoded at a high bitrate to test decoding performance, and YouTube has "ambitious goals" for rolling out AV1 shortly. As of version 2.10.13, released in early 2020, YouTube for Android TV supports playback of AV1 videos on platforms that support it.
Following Facebook's positive test results, it announced it would gradually roll out AV1 as soon as browser support arose, starting with its most popular videos.
Vimeo's "Staff picks" were made available in AV1 in June 2019. With further improvements to the encoder, Vimeo expects to eventually provide AV1 support for all videos uploaded to Vimeo and the company's "Live" service.
iQIYI announced support for AV1 on April 30, 2020, for users on PC and Android devices, becoming the first Chinese video streaming site to add AV1.
In April 2020, YouTube released hardware-based AV1 encoding supports for Android TV platforms, and Netflix launched a GPU-based AV1 decoder for Xbox One in November 2020.
Google has been reported as requiring AV1 support for new Android TV devices (as of Android TV Version 10) in January 2021.
Twitch plans to begin rolling out AV1 support for its most popular content in 2022 or 2023 and universal support in 2024 or 2025.
Libaom is the standard implementation. There is an encoder (aomenc) and a decoder (aomdec). Having been created as a research codec, it can show that every feature can be used efficiently but at the expense of encoding speed. rav1e is an encoder written in Rust and assembly. It was problematically slow when feature freeze was reached, but speed optimizations have continued to be made.
In contrast with Aomenc, Rav1e starts from a simple (and fast) conforming encoder, then improves its efficiency as time goes on while remaining speedy.
SVT-AV1 includes an open-source encoder and decoder explicitly designed for data center servers based on Intel Xeon processors. Together with Intel, Netflix has developed SVT-AV1. dav1d is a decoder written in C99 and assembly focusing on speed and portability. It was released for the first time in December 2018. According to developers, as of March 2019, users can "safely use the decoder on all platforms, with excellent performance," according to developers. As of May 2019, version 0.3 demonstrated performance up to 5 times faster than aomdec; version 0.5 was released in October 2019.
Dav1d replaced Libaom in May 2019 as Firefox 67's default decoder. To compare libgav1 and libaom in 2019, dav1d v0.5 was rated as the best decoder. dav1d 0.9.0 was released on May 17, 2021, and dav1d 0.9.2 was released on September 3, 2021.
Cisco AV1 is a proprietary live encoder that Cisco developed for its Webex teleconference pro2021. Cisco AV1 is a proprietary live encoder that Cisco created for Webex teleconferencing. Encoders are optimized for latency and CPU usage, as with a "common laptop." Cisco stressed that at their point of operation - high speed and low latency - the AV1's extensive toolset does not preclude low encoding complexity. Instead, they found good compression-to-speed tradeoffs with HEVC, even better than before due to the availability of tools for screen content and scalability. There has been improved high-resolution screen sharing compared to their previous H.264-encoder deployment.
libgav1 is a decoder written in C++11 released by Google. Several other organizations have announced they are working on encoders, including EVE for AV1 (in beta testing), NGCodec, Socionext, Aurora, and MilliCast.
- Firefox (software decoder since version 67.0, May 2019. Hardware decoder on compatible platforms since version 100.0, released on April 28.)
- Google Chrome (decoder since version 70, October 2018; encoder since 90, April 14, 2021)
- Opera (since version 57, 28 November 2018)
- Microsoft Edge (since Windows October 10, 2018, Update (1809) with AV1 Video Extension add-on)
- Vivaldi (since October 2018)
All browsers besides Microsoft Edge also have AVIF support.
- VLC media player (since version 3.0)
- mpv (since version 0.29.0)
- Xine-lib (since 1.2.10)
- PotPlayer (since version 1.7.14804, 16 October 2018)
- K-Lite Codec Pack (since version 14.4.5, 13 September 2018)
- FFmpeg (since version 4.0, 20 April 2018)
- HandBrake (since version 1.3.0, 9 November 2019; decoding support)
- Bitmovin Encoding (since version 1.50.0, 4 July 2018)
- DaVinci Resolve (since version 17.2, May 2021; decoding support)
- GStreamer (since version 1.14)
- OBS Studio (libaom and SVT-AV1 support since 27.2 Beta 1)
- MKVToolNix (adoption of final av1-in-mkv spec since version 28)
- MediaInfo (since version 18.03)
- Elecard StreamEye Studio (tools for video quality analysis)
- Google Duo (since April 2020)
- Adobe Audition (decoding support, preview video)
- Avidemux (since version 2.76, July 7, 2020; decoding support)
- VDPAU (since version 1.5, March 7, 2022; decoding support)
IBC 2018 featured AV1 products, such as Socionext's hardware-accelerated encoder. A Socionext official stated that the encoding accelerator runs on an Amazon EC2 cloud instance F1, which is ten times faster than existing software encoders. Early hardware support will be provided by software running on non-CPU hardware, as fixed-function hardware will not be available for another 12–18 months after bitstream freeze, plus a further six months after that until products based on these chips hit the market. As of March 28, 2018, the bitstream was frozen, which means that chips are expected between March and August 2019. By the end of 2019 or the beginning of 2020, products based on chips could be on the market.
On January 7, 2019, NGCodec announced AV1 support for NGCodec accelerated by Xilinx FPGAs.
Allegro DVT announced the AL-E210 multi-format video encoder hardware IP on April 18, 2019, the first hardware AV1 encoder to be announced on the open market.
Rockchip announced their RK3588 SoC with hardware decoding up to 4K 60 fps at 10-bit color depth.
Amphion released a video decoder on May 9, 2019, supporting AV1, up to 4K 60fps.
On May 28, 2019, Realtek announced the RTD2893, its first integrated circuit supporting AV1, up to 8K. Realtek announced the RTD1311 AV1 decoder SoC for set-top boxes on June 17, 2019.
The Amlogic roadmap of October 20, 2019, revealed three set-top box SoCs that can decode AV1 content, the S805X2, S905X4, and S908X. As of December, the S905X4 was used in the SDMC DV8919.
Chips&Media announced the WAVE510A VPU that supports AV1 at 4Kp120.
MediaTek announced the world's first smartphone SoC with an integrated AV1 decoder on November 26, 2019. Up to 4K 60fps can be decoded with the Dimensity 1000.
On January 3, 2020, LG Electronics announced that its 2020 8K TVs, based on the α9 Gen 3 processor, support AV1.
Samsung announced at CES 2020 that its new 8K QLED TVs, featuring Samsung's "Quantum Processor 8K SoC," will decode AVC 1.
According to Intel, their new Intel Xe-LP GPU in Tiger Lake will have AV1 fixed-function hardware decoding for the first time.
A new set of Nvidia GeForce RTX 30 Series GPUs will provide AV1 fixed-function hardware decoding starting on September 1, 2020.
AV1 fixed-function hardware decoding launched on September 2, 2020, with Intel's Tiger Lake 11th Gen CPUs.
A patch has been merged into the amdgpu drivers for Linux that adds AV1 decoding support for GPUs with RDNA2.
Roku refreshed its Ultra Roku streaming player on September 28, 2020, adding support for AV1.
According to Intel, AV1 decoding on Linux was added to version 20.3.0 of the Intel Media Driver on September 30, 2020.
On October 10, 2020, an official Microsoft blog post confirmed that the platform supports hardware decoding for AV1 on Xe-LP(Gen12), Ampere, and RDNA2.
Samsung announced the Exynos 2100 on January 12, 2021, and the processor is reportedly AV1 decode compatible. However, Samsung did not enable AV1 support at the time. AV1 fixed-function hardware decoding was first implemented on Rocket Lake 11th Gen CPUs on March 16, 2021.
Google launched Tensor with BigOcean supporting AV1 fixed-function hardware on October 19, 2021.
In October 2021, Intel launched its Alder Lake 12th Gen CPUs with hardware-based AV1 decoding. Alder Lake 12th Gen mobile CPUs and non-K series desktop CPUs with AV1 fixed-function hardware decoding were introduced by Intel on January 4, 2022. The Arctic Sound-M GPU features the industry's first AV1 encoder inside the GPU on February 17, 2022. Intel officially announced its Arc Alchemist family with AV1 fixed-function hardware decoding and fixed-function hardware encoding on March 30, 2022.
The Canadian company NETINT plans to release a hardware encoder for data centers based on an ASIC.
In October 2021, Google Pixel 6 was released with a hardware codec that can encode AV1 with 4K resolution and 60 frames per second.
Some technology observers believe it's likely that AV1 will include some intellectual property (IP) that conflicts with existing patents. Many efforts have gone into HEVC's core technologies, like quadtree structures, tiling, hardware optimization, and scalability. At least some of these core innovations would likely be employed or enhanced in the codec the Alliance proposes, so patents relating to that functionality would remain in effect.
There is also evidence that some Alliance members anticipate legal repercussions. Suppose AOM is a technical success and survives legal challenges. In that case, the era of royalty-based codecs will be over (as it should). Alliance members bring additional leverage beyond the financial war chest required to fight a patent infringement suit.
Additionally, Causevic and Fish emphasize that "many of these companies have significant pre-existing portfolios and cross-licensing agreements with huge numbers of companies across the globe." In other words, just because there is an infringement doesn't mean that it will force AV1 off the market or turn it into a royalty-bearing codec. It is only specific which way the IP-related challenge will go based on how successful the AV1 codec is.
An organization called Sisvel, based in Luxembourg has formed a patent pool. They have a patent license to sell for AV1. An announcement of the patent pool occurred in early 2019, but the first patent list was published on March 10, 2020. The index consists of more than 1050 patents, and patent claims are still being contested on their merits. Despite Sisvel's announcement that they do not intend to collect content royalties, no exemption for software is in their license.
Until March 2020, the Alliance for Open Media has not replied to the patent claims list. It is reported by The WebM Project that Google has no plans to modify its current or upcoming use of AV1 despite its knowledge of the patent pool. Third parties can demand licensing fees from open-source, royalty-free, and free software.
The AVIF format is an image file format that uses AV1 compression to store images or image sequences in HEIF files. In addition to HEIC, it competes with HEIC, which also uses ISOBMFF but HEVC for reduction.