We're all thinking it, but holy cow is FFmpeg one of the craziest pieces of software I have on my computer. The amount of neat things I've been able to do with it and toss them into a bash loop and go to bed is incredible. I'm sure a lot of us get a kick out of automating our tasks, and FFmpeg is king when it comes to that.
Curl's "one thing" is to get files from the network in every conceivable way possible. Ffmpeg's is to convert media files between every conceivable set of formats possible.
If you use `-vn` and `-acodec copy` (I use both although not sure `-vn` is strictly necessary), you can demux the audio from the video in the same format it is already in. Of course, you're extracting to wav so not transcoding, but copying may be faster/use less space.
What it does: Allows streaming content to Chrome Cast with my idea of subtitles through VLC.
For me I want subtitles with Tiresias screen font used in Finnish YLE 90s. (Aligned always to left second row, so that starting point is always the same. Center alignment is bad because you need to always re-adjust eyes to the position where the subtitles start, left alignment makes the first character always in same place.)
* `hwaccel cuvid -c:v h264_cuvid` * makes the hardware accelerated decoding (h264 only)
* `-vf` * video filter
* `hwdownload,format=nv12` * downloads the hardware accelerated frame to memory for the video filter (required by cuvid)
* `scale=(iwsar)max(1280/(iwsar)\,720/ih):ihmax(1280/(iw*sar)\,720/ih), crop=1280:720` * crops the video to 1280x720, (exteremely high impact on performance!) Use crop and resize cuvid below for better performance.
According to Yle they only started using Tiresias in 2012. The article doesn't mention which font they were using before. I would be interested in their 90s font as well.
Interesting, I haven't really watched Finnish TV past ten years. However it can't be too different from Tiresias, it was tall as well, and pretty big outline. You can see the 90s font here:
Edit: Comparing 90s font to Tiresias, I would too like to have that exact font they used in 90s. E.g. big "J" is terrible in Tiresias compared to that.
Extract 1 second of video every 90 seconds (if you have very long footage of a trip from a dashcam and you don't know what to do with it, that makes for a much shorter "souvenir"):
It will only approximately seek to a te using GOP i-frames if you do -ss before -i which in MPEG2 is generally ~15frames=0.5s. But a GOP can be pretty long, like 30s, in MPEG4.
Downmixes audio on movies from surround sound to stereo balanced for night watching (prevents audio being too quiet) so that they can direct stream on my devices
Goddammit I can't believe how downmixing x.y to 2.0 isn't a solved problem by now and is so broken through and through.
I mean, it's never ever done in a way that doesn't produce quiet dialog that makes you raise volume and loud sounds that makes your eardrums bleed. Like, even basic audio normalisation would produce half-decent results, but no we get crazy contrast by default and constant volume switching.
The interesting part is that most media is being consumed with stereo audio down-mixing. Everyone streaming content on their laptops, with their television's speakers or with a sound bar. Yet all the audio is recorded and mastered for surround sound systems, even though they could include mastered stereo audio without the down-mixing issues like quiet audio.
Here's one that generates AppStore previews with correct sizes and metadata.
(iTunes Connect can be really picky about this sometimes.)
# 1. Record your device using QuickTime
# (File->New Movie Recording->Select your phone)
# 2. Run `$ app-preview your-recording.mov`
function app-preview() {
echo "name $1"
ffmpeg -i $1 -vf scale=1080:1920,setsar=1 -c:a copy "out_$1"
}
Not an ffmpeg wizard here, but here are my screencasting commands. Probably the most useful knowledge in this snippet is how to use Alsa from ffmpeg; i.e., Alsa devices can be referred to as hw:0, hw:1, etc; and one finds out which device to use from arecord.
# Example output file.
f=/tmp/output.mp4
# Example video resolution.
g=1920x1080
# Example capture framerate
fr=4
# Example X11 display
d=$DISPLAY
# Simple screencast.
#
# Try adjusting the libx264 CRF from 15 to some greater number, as long
# as there is no visible effect on video quality.
#
# If increasing the capture framerate, you may also wish to use a
# faster preset.
ffmpeg -probesize 50M -f x11grab -video_size "$g" -framerate "$fr" -i "$d" \
-c:v libx264 -crf 15 -preset veryslow "$f"
# Simple screencast without drawing the pointer/cursor.
ffmpeg -probesize 50M -f x11grab -video_size "$g" -framerate "$fr" -draw_mouse 0 -i "$d" \
-c:v libx264 -crf 15 -preset veryslow "$f"
# See what devices are available for capturing sound.
arecord -l
# Select a device.
audioCaptureDevice=hw:0
# List some permitted parameters associated with device 0.
#
# We are interested in the "FORMAT", "CHANNELS", and "RATE" parameters
# for use in the ffmpeg command.
arecord --dump-hw-params -D "$audioCaptureDevice"
audioSampleFormat=pcm_s32le
audioNumChannels=2
audioRate=44100
f=/tmp/output.mp3
# Record sound.
ffmpeg -thread_queue_size 8192 -f alsa -channels "$audioNumChannels" -sample_rate "$audioRate" \
-c:a "$audioSampleFormat" -ar "$audioRate" -i "$audioCaptureDevice" "$f"
f=/tmp/output.mkv
# Screencast with sound.
#
# Note that there seems to be an FFMPEG bug where the audio in the last
# 15 seconds of the video is cut off. The workaround is to record for
# 15 exrtra seconds, and then cut the extra video.
ffmpeg -probesize 50M -f x11grab -video_size "$g" -framerate "$fr" -i "$d" \
-thread_queue_size 8192 -f alsa -channels "$audioNumChannels" -sample_rate "$audioRate" \
-c:a "$audioSampleFormat" -ar "$audioRate" -i "$audioCaptureDevice" \
-c:a flac -c:v libx264 -crf 17 "$f"
I wrote a command-line based video editing tool as a 300-line bash script. It reads a list of segments of video (source file, start position, length) to string together (including options such as image overlay, fade, fast forward, slow motion, static image) from a text file, and converts it into a Makefile with ffmpeg commands in it, which you can then run with whatever level of parallelism you wish. It treats the video and sound separately, and creates a video and sound file for each segment, using concatenatable formats for both. Then the final few make targets are to concatenate the video into one file, the sound into another file, and then multiplex them into a single file. Used it a few times for editing my own videos. It's a bit big to share here though.
On macOS, this uses hardware acceleration to reencode a video at a lower bitrate. My macbook is from 2012, so this does make a notable difference. There's also "hevc_videotoolbox" for H.265 if your machine supports it.
It will also significantly tank the quality (HW encoders are horrible at quality-per-bit ratio) - libx264 at this bitrate will make a huge difference in how good the video will look.
-max_muxing_queue_size 2048 (magically fixes some errors and microscopically increases quality, a no-brainer on machines with more than token amounts of RAM)
Back in 2004, someone released an unauthorized fan dub/narration of the the first Harry Potter movie called Wizard People, Dear Reader that hilariously butchers the entire plot, character names, and motivations. This narration is (loosely) synced to the movie and its scenes, and for a long time I watched it via clips uploaded to YouTube. But when Sorcerer’s Stone got a 4K release, I decided it was time to rip it and create my own canonical copy.
The original audio files of the dub were at this point still available on archive.org. The problem is that the second audio file is not meant to play directly after the first - this was back in the days of CD players, so halfway through the movie the dub instructs you to begin playing the second CD once the next scene starts. The other problem is that the second file is louder than the first.
Most sources I saw online said to insert a gap of three seconds to account for the delay, and didn’t have a solution for the difference in volume. I wanted to be more precise.
First, I found the exact start time of the scene where the second audio track begins:
ffprobe hp.mkv
...
Chapter #0:18: start 4428.882000, end 4817.521000
Metadata:
title : Chapter 19
...
Then I compared this with the duration of the first audio track:
The difference between these time stamps gave the actual delay of 3.582 seconds.
I then compared the maximum audio levels of the two audio tracks to determine the level to increase the first track’s volume by (there are more advanced features in FFmpeg for volume normalization, but I just wanted to remove the potential for eardrum damage when beginning Chapter 19 and keep things as similar as possible otherwise):
This gave me the volume increase for the first track of 7.5 dB.
Once I had these numbers, it was time for the one-liner to adjust the first track’s volume, concatenate the two tracks with the gap of silence, and mux them with the video from the movie:
But what are the odds that obscure formats like this will be implemented (and maintained) without security holes? I vaguely remember an exploit where the GNOME file viewer would automatically preview samples of any audio file that (I think) gstreamer could recognise, and one of those formats was something like a NES audio format, whose implementation had some buffer overflows in it. Are there similar concerns with ffmpeg?
> But what are the odds that obscure formats like this will be implemented (and maintained) without security holes?
Extremely high?
I wrote a stand-alone commercial DSP app which seems to still be sold ten years later. There might be security issues, there probably are since it hasn't been recompiled in years as far as I know, but I'm certain they aren't in the DSP parts.
DSP code involves large blocks of numbers which you translate into other numbers. If you're writing a format or facility within ffmpeg, you aren't directly reading or writing to files, or connecting to the Internet. It's probably that the whole API you write to is just blocks of numbers and some sort of data description.
ffmpeg as a whole might have security violations in it but adding new plug-ins or format won't necessarily increase those chances.
The problem is that when you use ffmpeg, you generally delegate the task of figuring out "what the hell is this data" to it - you don't want to figure out yourself if something is MP3, MP3 in an AVI container, VP9 in MKV, ...
So when ffmpeg adds obscure formats that no one ever uses and are mostly toy implementations, the risk is always that someone at the top (say, your Electron app) feeds in user-supplied data, the 10 layers in between pass it along, until ffmpeg at the bottom goes "oh I know this one, it's a Digital Pictures SGA game"!
Now Chromium wisely disables most of this trap code in its copy of ffmpeg from even being compiled in the first place, but that's probably not the case for the ffmpeg copy on your operating system that you might use for some server-side processing task.
Running ffmpeg on untrusted input outside of a sandbox you trust to be secure is an extremely bad idea. It doesn't have a good track record for that, and making it safe to use ffmpeg on untrusted input has never been a priority for the project.
In practice things like video hosting services which use ffmpeg internally tend to disable support for most of the obscure file formats to reduce the potential attack surface.
You can't expect absence of security holes. ffmpeg is an extremely complex piece of software written with unsafe C and unsafe assembler. It's full of security holes, that's for sure. If that's of concern to you, use additional security measures to mitigate that threat.
They are muxers, which allow you to put already-encoded audio/video (though in the likely case they use standard codecs ffmpeg can also handle the encoding for you) into these file formats.
While working with ffmpeg over the years, i always thought that FFmpeg should have a simple to use UI.
I have recently started working on a desktop(electron) tool that wraps ffmpeg script in easy to use interface
Yes please do it, the Windows Admin Center has this, where any GUI thing you do you can click the PS button and see the powershell that is being run to execute the task. It’s helpful in a number of ways, I learn more ps, and I can build on their thing if they haven’t built out a way to do it yet through the GUI but posh supports it.
The use cases are so diverse that adding a UI for all use cases is near impossible. I also built a UI for ffmpeg, but mine was geared towards creating and applying video filters - much like Lightroom presets.
Sadly we are still incredibly lacking in encoding. It's been years since hevc encoding was included in consumer graphics cards and embeddable devices. For one reason or another vendors just haven't done av1 on the hardware level.
I understand that you are probably trying to be funny, but just to be clear, linux iso's are not videos and therefore can't be compressed with a video codec.
Assuming that you actually meant video content, I think your question may be a bit misguided on the nuances and goals of video encoding. Video encoding can be both lossy and lossless. Lossless video encoding isn't particularly interesting in most cases, but I do believe that HEVC (H.265) will usually come out slightly smaller. However anything to do with encoding will always vary based on the actual source content. So partial answer to your question would probably be x265, but it depends. Based on the source you could construct theoretical content that could be better tuned to one or the other's encoding strengths.
Where it gets interesting is in lossy encoding. With lossy encoding you seek to retain visual acuity to a certain standard while minimizing size and/or processing requirements. Both codecs do an excellent job at removing the right amount of information to effectively fool the human observer. With lossy encoding there isn't really a filesize difference, as you tune your filesize to whatever you want to given your source and your desired output constraints. The big feature of av1 is that it is open and unencumbered by patents+royalties and will hopefully therefore make it THE industry standard in the coming years. It's openness also makes it more likely to eventually be ubiquitous as it should be implementable and playable on most new video platforms and hardware, and hopefully the mythical one format that just works everywhere.
For some animated videos, the difference was quite impressive to me.
AV1 100 MiB
H.265 500 MiB
H.264 1.5 GiB
At this point, the audio codecs and track count start making a difference, so this isn't really a fair comparison. And BTW, in terms of video quality in the files above, AV1 > H.265 > H.264.
In non-animated content, the difference is less impressive, but comes at about ~20% in favor of AV1 vs HEVC, in my subjective-and-not-rigorously-benchmarked opinion. But "video quality" is subjective anyway.
Since image quality is subjective it's hard to say exactly, but estimates I've seen is a 20-30% size reduction for the same quality in AV1 compared to HEVC.
Cannot offer any data on that but I experimented with a SD TV series episode. HEVC and AV1 ended up at the same file size (few kilobytes difference) but encoding AV1 took over 10 times as long using rav1e.
Considering lossy encoding, you cant compare file sizes without comparing also quality, or you're just comparing default settings. I can make a tiny mpeg1 file (which will just be a pixel soup)
I'm a voracious audiobook consumer and get my books in all forms and shapes. At some point, I got tired of organizing them and decided to put them in my Calibre library, re-encoding each book first into a single file preferably with chapter information. To my surprise, existing tools were quite cumbersome, so I put together a small nodejs CLI[1] that uses ffmpeg for encoding. It was hacked together in few hours, but has saved me a ton of time since.
Looking at the comments in that ticket, I can understand why ffmpeg developers don't want to help those people, and the devs probably spoiled more time answering this annoying person than it would take to have fixed the issue, making them lose all motivation to work on that issue.
> But what to do with such a person if you can't/won't ban them from posting?
How about ignoring details and only focus about the technicality?
Looking at that way too long thread, both sides seem to be blamed on way or another. From what I can see, HDR support missing is actually an issue. Instead of letting him know they accept the bug, they are feeding him with rhetorical questions .
Developers should not worry about this people. They should not take pride in making people happy but take it from knowing millions and millions of people will benefit from your code. No matter how much of an asshole the bug reporter is, a bug / missing feature is just that.
Then the OP started complaining... that it's taking too long to fix, that ffmpeg developers are not paying attention to his bug... etc, all while ignoring everything said to him
That person is causing all the pointless animosity.
I do agree that it looks like they should not have engaged him on his level by pouring gas on the flames every time he went off the rails. Remaining on topic would have been the "professional" way to go I guess and just leave him to rant alone. They did take the bugreport though and never questioned that it was valid. All of that still does not excuse the behaviour of the reporter.
I think your assessment that developers should not derive any pride from happy users and are merely there to serve a greater good is a bit far from reality. Maybe they should not worry about things like this but then again these are also just people living their lives. This is not a corporate entity where you engage with a PR representative, these are people, doing what they like to do.
I mean it's not about making this toxic person happy in particular, but the huge amount of people this person is representing. It was in response of saying why even bother and stay motivated with people like this.
This person isn't doing anything wrong. He's appealing to what he perceives as universal principles to attract attention to a technical matter of importance to him. Yes, he's being aggressive; no, there's nothing wrong with that.
If you take away the voices of such people, pretty much the only voice left speaking is the voice of corporate money, which is usually anthithetical to the principles of open source development.
EDIT: Especially considering features like Gopher and MSP were deemed worthy of attention, the fact that his point about HDR metadata being silently stripped was not being addressed in a year is concerning.
>Yes, he's being aggressive; no, there's nothing wrong with that.
Being an asshole is bad, specially when you are asking for help. Not being an asshole doesn't make you a corporate shill nor being an asshole make your feedback "better".
If anyone is an AV guru and wants to blog highlights, I for one would love to read it. I use ffmpeg blind by recipes, I'd love an industry insider commentary on the good bits
I hope I can ask this question here.
Is it possible to create a text file of timestamps (from ts -> to ts) in as many lines as possible and have this file in same directory as a video file. Any video player should read this file (just like subtitles) and skip the sections listed in the file. I hope to distribute this file as a form of editing without modifying original video file.
ffmpeg-python is used at my workplace so if you've ever cleaned up the docs, or submitted a patch, you have my greatest thanks! Saved my ass big time a couple weeks ago.
It needs both hardware and driver support. You can check supported formats using vdpauinfo. Nouveau maintains a list of supported formats for different hardware at https://nouveau.freedesktop.org/VideoAcceleration.html (I don't think HEVC is supported by nouveau on any hardware.)
Huge thank you to FFmpeg -- it's the core of my app's functionality: extract screenshots and show them to users in a neat gallery. Specifically - it can with 1 command line argument grab a dozen or two (user choice) screenshots at even intervals (my choice) and stitch them together into one long horizontal jpg file (adding black bars if ratio isn't what I want). AMAZING!