If you use `-vn` and `-acodec copy` (I use both although not sure `-vn` is strictly necessary), you can demux the audio from the video in the same format it is already in. Of course, you're extracting to wav so not transcoding, but copying may be faster/use less space.
What it does: Allows streaming content to Chrome Cast with my idea of subtitles through VLC.
For me I want subtitles with Tiresias screen font used in Finnish YLE 90s. (Aligned always to left second row, so that starting point is always the same. Center alignment is bad because you need to always re-adjust eyes to the position where the subtitles start, left alignment makes the first character always in same place.)
* `hwaccel cuvid -c:v h264_cuvid` * makes the hardware accelerated decoding (h264 only)
* `-vf` * video filter
* `hwdownload,format=nv12` * downloads the hardware accelerated frame to memory for the video filter (required by cuvid)
* `scale=(iwsar)max(1280/(iwsar)\,720/ih):ihmax(1280/(iw*sar)\,720/ih), crop=1280:720` * crops the video to 1280x720, (exteremely high impact on performance!) Use crop and resize cuvid below for better performance.
According to Yle they only started using Tiresias in 2012. The article doesn't mention which font they were using before. I would be interested in their 90s font as well.
Interesting, I haven't really watched Finnish TV past ten years. However it can't be too different from Tiresias, it was tall as well, and pretty big outline. You can see the 90s font here:
Edit: Comparing 90s font to Tiresias, I would too like to have that exact font they used in 90s. E.g. big "J" is terrible in Tiresias compared to that.
Extract 1 second of video every 90 seconds (if you have very long footage of a trip from a dashcam and you don't know what to do with it, that makes for a much shorter "souvenir"):
It will only approximately seek to a te using GOP i-frames if you do -ss before -i which in MPEG2 is generally ~15frames=0.5s. But a GOP can be pretty long, like 30s, in MPEG4.
Downmixes audio on movies from surround sound to stereo balanced for night watching (prevents audio being too quiet) so that they can direct stream on my devices
Goddammit I can't believe how downmixing x.y to 2.0 isn't a solved problem by now and is so broken through and through.
I mean, it's never ever done in a way that doesn't produce quiet dialog that makes you raise volume and loud sounds that makes your eardrums bleed. Like, even basic audio normalisation would produce half-decent results, but no we get crazy contrast by default and constant volume switching.
The interesting part is that most media is being consumed with stereo audio down-mixing. Everyone streaming content on their laptops, with their television's speakers or with a sound bar. Yet all the audio is recorded and mastered for surround sound systems, even though they could include mastered stereo audio without the down-mixing issues like quiet audio.
Here's one that generates AppStore previews with correct sizes and metadata.
(iTunes Connect can be really picky about this sometimes.)
# 1. Record your device using QuickTime
# (File->New Movie Recording->Select your phone)
# 2. Run `$ app-preview your-recording.mov`
function app-preview() {
echo "name $1"
ffmpeg -i $1 -vf scale=1080:1920,setsar=1 -c:a copy "out_$1"
}
Not an ffmpeg wizard here, but here are my screencasting commands. Probably the most useful knowledge in this snippet is how to use Alsa from ffmpeg; i.e., Alsa devices can be referred to as hw:0, hw:1, etc; and one finds out which device to use from arecord.
# Example output file.
f=/tmp/output.mp4
# Example video resolution.
g=1920x1080
# Example capture framerate
fr=4
# Example X11 display
d=$DISPLAY
# Simple screencast.
#
# Try adjusting the libx264 CRF from 15 to some greater number, as long
# as there is no visible effect on video quality.
#
# If increasing the capture framerate, you may also wish to use a
# faster preset.
ffmpeg -probesize 50M -f x11grab -video_size "$g" -framerate "$fr" -i "$d" \
-c:v libx264 -crf 15 -preset veryslow "$f"
# Simple screencast without drawing the pointer/cursor.
ffmpeg -probesize 50M -f x11grab -video_size "$g" -framerate "$fr" -draw_mouse 0 -i "$d" \
-c:v libx264 -crf 15 -preset veryslow "$f"
# See what devices are available for capturing sound.
arecord -l
# Select a device.
audioCaptureDevice=hw:0
# List some permitted parameters associated with device 0.
#
# We are interested in the "FORMAT", "CHANNELS", and "RATE" parameters
# for use in the ffmpeg command.
arecord --dump-hw-params -D "$audioCaptureDevice"
audioSampleFormat=pcm_s32le
audioNumChannels=2
audioRate=44100
f=/tmp/output.mp3
# Record sound.
ffmpeg -thread_queue_size 8192 -f alsa -channels "$audioNumChannels" -sample_rate "$audioRate" \
-c:a "$audioSampleFormat" -ar "$audioRate" -i "$audioCaptureDevice" "$f"
f=/tmp/output.mkv
# Screencast with sound.
#
# Note that there seems to be an FFMPEG bug where the audio in the last
# 15 seconds of the video is cut off. The workaround is to record for
# 15 exrtra seconds, and then cut the extra video.
ffmpeg -probesize 50M -f x11grab -video_size "$g" -framerate "$fr" -i "$d" \
-thread_queue_size 8192 -f alsa -channels "$audioNumChannels" -sample_rate "$audioRate" \
-c:a "$audioSampleFormat" -ar "$audioRate" -i "$audioCaptureDevice" \
-c:a flac -c:v libx264 -crf 17 "$f"
I wrote a command-line based video editing tool as a 300-line bash script. It reads a list of segments of video (source file, start position, length) to string together (including options such as image overlay, fade, fast forward, slow motion, static image) from a text file, and converts it into a Makefile with ffmpeg commands in it, which you can then run with whatever level of parallelism you wish. It treats the video and sound separately, and creates a video and sound file for each segment, using concatenatable formats for both. Then the final few make targets are to concatenate the video into one file, the sound into another file, and then multiplex them into a single file. Used it a few times for editing my own videos. It's a bit big to share here though.
On macOS, this uses hardware acceleration to reencode a video at a lower bitrate. My macbook is from 2012, so this does make a notable difference. There's also "hevc_videotoolbox" for H.265 if your machine supports it.
It will also significantly tank the quality (HW encoders are horrible at quality-per-bit ratio) - libx264 at this bitrate will make a huge difference in how good the video will look.
-max_muxing_queue_size 2048 (magically fixes some errors and microscopically increases quality, a no-brainer on machines with more than token amounts of RAM)
Back in 2004, someone released an unauthorized fan dub/narration of the the first Harry Potter movie called Wizard People, Dear Reader that hilariously butchers the entire plot, character names, and motivations. This narration is (loosely) synced to the movie and its scenes, and for a long time I watched it via clips uploaded to YouTube. But when Sorcerer’s Stone got a 4K release, I decided it was time to rip it and create my own canonical copy.
The original audio files of the dub were at this point still available on archive.org. The problem is that the second audio file is not meant to play directly after the first - this was back in the days of CD players, so halfway through the movie the dub instructs you to begin playing the second CD once the next scene starts. The other problem is that the second file is louder than the first.
Most sources I saw online said to insert a gap of three seconds to account for the delay, and didn’t have a solution for the difference in volume. I wanted to be more precise.
First, I found the exact start time of the scene where the second audio track begins:
ffprobe hp.mkv
...
Chapter #0:18: start 4428.882000, end 4817.521000
Metadata:
title : Chapter 19
...
Then I compared this with the duration of the first audio track:
The difference between these time stamps gave the actual delay of 3.582 seconds.
I then compared the maximum audio levels of the two audio tracks to determine the level to increase the first track’s volume by (there are more advanced features in FFmpeg for volume normalization, but I just wanted to remove the potential for eardrum damage when beginning Chapter 19 and keep things as similar as possible otherwise):
This gave me the volume increase for the first track of 7.5 dB.
Once I had these numbers, it was time for the one-liner to adjust the first track’s volume, concatenate the two tracks with the gap of silence, and mux them with the video from the movie: