Sample rates in audio hardware aren't like programming constants, where they're the same for everybody. Over 30 minutes, a 0.05% sample rate error gets you 1s of drift over the recording. As a reference, USB 2.0 has a 0.25% frequency tolerance (and is used to clock many audio devices).
Cheap quartz clocks in computers and some USB ADCs especially are prone to slightly changing their rates depending on temperature. So the sample rates can differ relative to each other.
The clock drifts. Something needs to count those seconds. Even when the drift is small, phasing distortions become pretty obvious on lengthy recordings.
There's some interesting work going on in the AES to support synchronised audio over wide area networks, either through better recovery of PTP clocks distributed through WANs or using PTP with GNSS.
Maybe actual clock differences? Not sure if that's the case, but in audio engineering, a separate clock may be used to keep all devices involved in-sync (many pro-level audio devices have a "clock" input for this very reason).
In RF engineering, it's typical to have all of your equipment referencing the same 10MHz clock (or a 1 pulse per second or IRIG-B). If I don't have a GPS receiver or a rubidium source, then I'll just pick the newest, most expensive piece of equipment with a built-in reference clock and fan it out to the rest of the equipment on the bench. Some portable spectrum analyzers have built-in GPS receivers so even out in the field you know you have a good reference.
Huh, I've consciously thought in the past of this as an outsider and concluded that by now it's a common enough task so of course they must've had an algorithm for doing it automatically.
As someone who worked as an audio engineer, solving problems before they can occur saves so much time and headache. There's no reason to faff about with software or complexity-inducing algorithms when the whole problem can be fixed by toggling one switch.
Technically you could accomplish the same thing by applying a parametric eq to the master buss, but then you're no longer software agnostic.
It's like photography; sure one can post-process photos in photoshop. But getting everything right before taking the picture, at a hardware level, simplifies things for everyone involved.
There are plugins for different scenarios, but it turns into one of those problems where hearing and correcting issues is much easier for humans than computers. The tools available make it easier to fix problems, but it still takes a recording engineer to spot-check.
Do you have any insights you can offer on how best to do this? I have to deal with drift issues on signal processing of .wav files, and I have always used a marker pulse every so often.