Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely). So long as the synthesized noise doesn't have an obvious temporal pattern, comparing stills should be fine.

Regarding aesthetics, I don't think AV1 synthesized grain takes into account the size of the grains in the source video, so chunky grain from an old film source, with its big silver halide crystals, will appear as fine grain in the synthesis, which looks wrong (this might be mitigated by a good film denoiser). It also doesn't model film's separate color components properly, but supposedly that doesn't matter because Netflix's video sources are often chroma subsampled to begin with: https://norkin.org/pdf/DCC_2018_AV1_film_grain.pdf

Disclaimer: I just read about this stuff casually so I could be wrong.



> Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely)

That might seem like a reasonable assumption, but in practice it’s not really the case. Due to nonlinear response curves, adding noise to a bright part of an image has far less effect than a darker part. If the image is completely blown out the grain may not be discernible at all. So practically speaking, grain does travel with objects in a scene.

This means detail is indeed encoded in grain to an extent. If you algorithmically denoise an image and then subtract the result from the original to get only the grain, you can easily see “ghost” patterns in the grain that reflect the original image. This represents lost image data that cannot be recovered by adding synthetic grain.


It sounds like the "scaling function" mentioned in the article may be intended to account for the nonlinear interaction of the noise.


> If you algorithmically denoise an image and then subtract the result from the original to get only the grain, you can easily see “ghost” patterns in the grain that reflect the original image. This represents lost image data that cannot be recovered by adding synthetic grain.

The synthesized grain is dependent on the brightness. If you were to just replace the frames with the synthesized grain described in the OP post instead of adding it, you would see something very similar.


> So long as the synthesized noise doesn't have an obvious temporal pattern, comparing stills should be fine.

The problem is that the initial noise-removal and compression passes still removed detail (that is more visible in motion than in stills) that you aren't adding back.

If you do noise-removal well you don't have to lose detail over time.

But it's much harder to do streaming-level video compression on a noisy source without losing that detail.

The grain they're adding somewhat distracts from the compression blurriness but doesn't bring back the detail.


>The grain they're adding somewhat distracts from the compression blurriness but doesn't bring back the detail.

Instead of wasting bits trying to compress noise, they can remove noise first, then compress, then add noise back. So now there aren't wasted bits compressing noise, and those bits can be used to compress detail instead of noise. So if you compare FGS compression vs non-FGS compression at the same bitrate, the FGS compression did add some detail back.


I imagined that at some point someone would come up with the idea “let’s remove more noise to compress things better and then add it back on the client”. Turns out, it is Netflix (I mean, who else wins so much from saving bandwidth).

Personally I rejected the idea after thinking about it for a couple of minutes, and I’m not yet sure I was wrong.

The challenge with noise is that it is actually cannot be perfectly automatically distinguished and removed from what could be finer details and textures even in a still photo, not to mention high-resolution footage. If removing noise was as simple as that, digital photography would be completely different. If you have removed noise, you can’t just add back missing detail later—if you could, you would not have removed it in the first place (alas, no algorithm is good enough, and even human eye can be faulty).


I'm not saying that the final result is as good as the original.

I'm saying that the final result is better than standard compression at the same bitrate.


That might be true; however, if this takes hold I would be surprised if they choose to keep producing and shipping the tasty grain high fidelity footage.

Considering that NR is generally among the very first steps in development pipeline (as that’s where it is the most effective), and the rest of dynamic range wrangling and colour grading comes on top of it, they might consider it a “waste” to 1) process two times (once with this new extreme NR, once with minimal NR that leaves the original grain), 2) keep around both copies, and especially (the costliest step) to 3) ship that delicious analog noise over Internet to people who want quality.

I mean, how far do we go? It’ll take even less bandwidth to just ship prompts to a client that generates the entire thing on the fly. Imagine the compression ratios…


That argument could be made to reject any form of lossy compression.

Lossy compression enables many use cases that would otherwise be impossible. Is it annoying that streaming companies drive the bitrate overly low? Yes. However, we shouldn't blame the existence of lossy compression algorithms for that. Without lossy compression, streaming wouldn't be feasible in the first place.


> Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely). So long as the synthesized noise doesn't have an obvious temporal pattern, comparing stills should be fine.

Sorry if I wasn't clear -- I was referring to the underlying objects moving. The codec is trying to capture those details, the same way our eye does.

But regardless of that, you absolutely cannot compare stills. Stills do not allow you to compare against the detail that is only visible over a number of frames.


People often assume noise is normal and IID but it usually isn't. It's s fine approximation but isn't the same thing, which is what the parent is discussing.

Here's an example that might help you intuit why this is true.

Let's suppose you have a digital camera and walk towards a radiation source and then away. Each radioactive particle that hits the CCD causes it to over saturate, creating visible noise in the image. The noise it introduces is random (Poisson) but your movement isn't.

Now think about how noise is introduced. There's a lot of ways actually, but I'm sure this thought exercise will reveal to you how some cause noise across frames to be dependent. Maybe as a first thought, think about from sitting on a shelf degrading.


I think this is geared towards film grain noise, which is independent from movement?


It's the same thing. Yes, not related to the movement of the camera, but I thought that would be easier to build your intuition about silver particles being deposited onto film. You make in batches, right?

The point is that just because things are random doesn't mean there aren't biases.

To get much more accurate, it helps to understand what randomness actually is. It is a measurement of uncertainty. A measurement of the unknown. This is even true for quantum processes that are truly random. That means we can't know. But just because we can't know doesn't mean it's completely unknown, right? We have different types of distributions and different parameters in those distributions. That's what we're trying to build intuition about


I think you've missed the point here: the noise in the originals acts as dithering, and increases the resolution of the original video. This is similar to the noise introduced intentionally in astronomy[1] and in signal processing[2].

Smoothing the noise out doesn't make use of that additional resolution, unless the smoothing happens over the time axis as well.

Perfectly replicating the noise doesn't help in this situation.

[1]: https://telescope.live/blog/improve-image-quality-dithering [2] https://electronics.stackexchange.com/questions/69748/using-...


Your first link doesn't seem to be about introducing noise, but removing it by averaging the value of multiple captures. The second is to mask quantizer-correlated noise in audio, which I'd compare to spatial masking of banding artifacts in video.

Noise is reduced to make the frame more compressible. This reduces the resolution of the original only because it inevitably removes some of the signal that can't be differentiated from noise. But even after noise reduction, successive frames of a still scene retain some frame-to-frame variance, unless the noise removal is too aggressive. When you play back that sequence of noise-reduced frames you still get a temporal dithering effect.


Here's[1] a more concrete source, which summarizes dithering in analog to digital converters as follows:

With no dither, each analog input voltage is assigned one and only one code. Thus, there is no difference in the output for voltages located on the same ‘‘step’’ of the ADC’s ‘‘staircase’’ transfer curve. With dither, each analog input voltage is assigned a probability distribution for being in one of several digital codes. Now, different voltages with-in the same ‘‘step’’ of the original ADC transfer function are assigned different probability distributions. Thus, one can see how the resolution of an ADC can be improved to below an LSB.

In actual film, I presume the random inconsistencies of the individual silver halide grains is the noise source, and when watching such a film, I presume the eyes are doing the averaging through persistence of vision[2].

In either case, a key point is that you can't bring back any details by adding noise after the fact.

[1]: https://www.ti.com/lit/an/snoa232/snoa232.pdf section 3.0 - Dither

[2]: https://en.wikipedia.org/wiki/Persistence_of_vision


One thing worth noting is that this extra detail from dithering can be recovered when denoising by storing the image to higher precision. This is a lot of the reason 10 bit AV1 is so popular. It turns out that by adding extra bits of image, you end up with an image that is easier to compress accurately since the encoder has lower error from quantization.


The AR coefficients described in the paper are what allow basic modeling of the scale of the noise.

> In this case, L = 0 corresponds to the case of modeling Gaussian noise whereas higher values of L may correspond to film grain with larger size of grains.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: