urhengulas.github.io

Google Summer of Code 2020 - Second part

Initially I planned to write one blog post for each month. But because of LIFE™ I didn’t come to finish the second and to even start the third one. 🤷
Thereby this post is about the second part of my GSoC, which is month two and three.

In my previous post about my first month of the Google Summer of Code 2020 at Mozilla I told you about my first steps of enabling rav1e to encode in the browser using webassembly.

This time we are actually encoding a video! 😲

Table of Contents

Encoding process

(#2446, aws/dcv-color-primitives#35)

The encoding process consists of three steps

  1. Collecting the pixel-data from an HtmlImageElement or HtmlVideoElement
  2. Construct rav1e-compatible (a) Frame(s) from the data
  3. Encode the data and emit (a) Packet(s)

I am going to describe the process for a HtmlImageElement, which just happens multiple times per second for a HtmlVideoElement.

Collect

The rough idea of how to access to pixel data is based on Alexander Fallenstedts blogpost “Using Rust and WebAssembly to Process Pixels from a Video Feed” (link).

We create an off-screen Canvas with Canvas::new(width: u32, height: u32) -> Self in the same size like the HtmlImageElement and then draw the image onto it using Canvas::draw_image(&self, img: &HtmlImageElement). Next we access the RGBA data with Canvas::data_rgba(&self) -> Vec<u8>, which is “a one-dimensional array in the RGBA order, with integer values between 0 and 255 (inclusive)”. The method Canvas::data_argb(&self) -> Vec<u8> additionally converts the data to ARGB.

let img: HtmlImageElement = /* ... */;
let canvas = Canvas::new(img.width(), img.height());
canvas.draw_image(img);
let rgba_data: Vec<u8> = canvas.data_rgba(&self);

What’s happening in the background in javascript-land is pretty similar: We draw the image with CanvasRenderingContext2D.drawImage() and access the pixel-data data with CanvasRenderingContext2D.getImageData().data.

Construct

For the data transformation we need to convert the data to YCbCr, which is a color palette more suitable for storage and transmission, since it has less redundancy than RGBA.
Get an overview over different YCbCr-formats in the videolan wiki.

I was on the lookout for a convenient library to achieve this and my view got captured by dcv-color-primitives (dcp for short, crates.io) because it:

  1. aims to be “[a]ware of the underlying hardware and supplemental cpu extension sets”
  2. has 0 dependencies 😱
  3. has good documentation

The only problem was that it didn’t support wasm32 targets. I opened aws/dcv-color-primitives#35 to ask about this and quickly got an answer by fabiosky. He quickly provided a small workaround and after a couple of weeks the webassembly support was introduced by aws/dcv-color-primitives#36 and got shipped in v0.1.14.

Using dcp we can convert the data with Canvas::data_i444(&self) -> [Vec<u8>; 3]. The resulting data have the pixel format I444 and color space Bt709.

Here you can see what is happening in the background:

// ImageFormats
let src_format = ImageFormat { pixel_format: PixelFormat::Argb, color_space: ColorSpace::Lrgb, num_planes: 1 };
let dst_format = ImageFormat { pixel_format: PixelFormat::I444, color_space: ColorSpace::Bt709, num_planes: 3 };

// Buffer
let data: Vec<u8> = canvas.data_argb();
let src_buffer = &[data.as_slice()];
let dst_buffer: &mut [&mut [u8]] = /* snip... */;

// Convert data and write it to `dst_buffer`
dcp::convert_image(img.width(), img.height(), &src_format, None, src_buffer, &dst_format, None, dst_buffer).unwrap();

Next we configure an Encoder fitting the pixel format and color space, emit a blank Frame and copy our data into it.

// Configure and create the encoder
let mut conf = EncoderConfig::default();
conf.chroma_sampling = ChromaSampling::Cs444;
conf.color_description = Some(ColorDescription { color_primaries: ColorPrimaries::BT709, transfer_characteristics: TransferCharacteristics::BT709, matrix_coefficients: MatrixCoefficients::BT709 });
let ctx: Context<u8> = Config::new().with_encoder_config(conf).new_context().unwrap();

// Copy data into frame
let (chroma_width, _) = conf.chroma_sampling.get_chroma_dimensions(img.width(), img.height());
let mut f = ctx.new_frame();
f.planes[0].copy_from_raw_u8(dst_buffer[0].as_slice(), width, 1);
f.planes[1].copy_from_raw_u8(dst_buffer[1].as_slice(), chroma_width, 1);
f.planes[2].copy_from_raw_u8(dst_buffer[2].as_slice(), chroma_width, 1);
// our final frame – filled and configured
let frame = Frame { f };

All of this process is abstracted away behind Canvas::create_frame(&self) -> Frame<u8>.

Encode

Finally the Frame we just created can be passed to an Encoder, get encoded and emitted as Packet.

let frame_encoder = FrameEncoder { ctx };
frame_encoder.send_frame(frame);
let packet = frame_encoder.receive_packet().unwrap();

Since encoding HtmlImageElements and HtmlVideoElements happens with different conditions we have two separate structs for it, FrameEncoder and VideoEncoder. Both implement the trait Encoder, which I am actually quite proud of. See a simplified version:

/// Implements basic encoder functionality.
pub trait Encoder {
  /// Non-mutable access to encoder context
  fn ctx<'a>(&'a self) -> Box<dyn Deref<Target = Context<u8>> + 'a>;

  /// Mutable access to encoder context
  fn ctx_mut<'a>(&'a mut self) -> Box<dyn DerefMut<Target = Context<u8>> + 'a>;

  // Various blanket implementations for creating and sending frames, flushing and encoding
}

What’s cool about this is, that all datatypes which give access to an encoder-context, can easily implement all basic encoder functionality.

The implementation on the FrameEncoder looks simply like:

pub struct FrameEncoder {
  ctx: Context<u8>,
}

impl Encoder for FrameEncoder {
  fn ctx<'a>(&'a self) -> Box<dyn Deref<Target = Context<u8>> + 'a> {
    Box::new(&self.ctx)
  }

  fn ctx_mut<'a>(&'a mut self) -> Box<dyn DerefMut<Target = Context<u8>> + 'a> {
    Box::new(&mut self.ctx)
  }
}

The implementation on the VideoEncoder is more interesting:

pub struct VideoEncoder {
  ctx: Rc<RefCell<Context<u8>>>,
  canvas: Rc<RefCell<Canvas>>,
}

impl Encoder for VideoEncoder {
  fn ctx<'a>(&'a self) -> Box<dyn Deref<Target = Context<u8>> + 'a> {
    Box::new(self.ctx.borrow())
  }

  fn ctx_mut<'a>(&'a mut self) -> Box<dyn DerefMut<Target = Context<u8>> + 'a> {
    Box::new(self.ctx.borrow_mut())
  }
}

Since the video-encoder encodes multiple frames it additionally to the encoder context holds it’s “working-horse-canvas”. Both of these are wrapped into a Rc<RefCell<...>>, which is needed because rust can’t ensure the borrowing rules in javascript-land on-compile-time. The wrapper checks the borrowing rules on-runtime.

Other work

Following you can find a list of other work I did.

wasm-opt issue

(rustwasm/wasm-pack#886, #2479)

Quite randomly wasm-pack stopped working, puking a immense error message at me, saying sth about invalid input for wasm-opt. Obviously I tried to figure what I did wrong this time, but it failed even on master (jsapi v0.1 at this time) which was definitely working before and also in the CI. So I thought there is a slight chance it’s actually not my fault and so I opened rustwasm/wasm-pack#886. Quickly some more people with the same issue were found and another issue WebAssembly/binaryen#3006 dealing with the same issue popped up.

After some back and forth it turned out that the generated .wasm uses some non-MVP feature, mutable-global) in particular.

prevent passing frames

(#2460)

I opened #2460 to figure out the problem with feeding the pixel data into the frame. Though the issue was a misconfiguration on my side, it also revealed that the rav1e::api::Context<T> doesn’t validate the dimensions of the input frame (see #2461).

jsapi v0.2.1

On branch jsapi.v0.2.1@urhengulas/rav1e I started work focused on improving the encoding speed of the VideoEncoder.
The main gain is coming from constructing the Frame not during, but after the data collection.

The speed could be further improved using wasm-threads, which got implemented in Firefox 79 and are already present in Chrome.

Misc

Epilog

Since my direct mentor and others of the rav1e-contributors are affected by the mozilla-layoffs I want to express my condolences, but am sure that experts of their skill level will quickly find something new and maybe more exiting.

I want to thank all the people supporting me and working together with me during the Google Summer of Code. It was an amazing experience, I could greatly improve my rust-knowledge, learn about the domain of video-processing and got confidence moving around in the realms of open-source.