I'm not hardcore on auto-formatters, but I think their impact on code history is negligible in the case of every legacy system I've worked on. The code history just isn't there. These aren't projects that used git until recently (if at all). Before that they used something else, but when they transitioned they didn't preserve the history. And that's if they used any version control system. I've tried to help teams whose idea of version control was emailing someone (they termed them "QA/CM") to make a read-only backup of the source directory every few months (usually at a critical review period in the project, so a lot of code was changed between these snapshots).
That said, sure, skip them if you're worried about the history getting messed up or use them more selectively.
SVN was a thing by the mid-2000's, and history from that is easy to preserve in git. Just how old are the sourcebases in question? (Not to shoot the messenger; just like, wow.)
I maintain a C++ codebase that was originally written in 1996, and is mission critical for my organization. Originally maintained in Visual Sourcesafe, then in TFS source control, and now git. Some parts of it were rewritten (several times) in C#, but the core is still C++.
I was very worried when we transitioned to git that history will not be preserved and tried to preserve it, but it proved too much hassle so I dropped it.
In fact that proved not to be a problem. Well, not a problem for me, since I remember all the history of the code and all the half forgotten half baked features and why they are there. But if I'm gone then yes, it's going to be a problem. It's in a dire need for a rewrite, but this has been postponed again and again.
The first large C++ project I worked on in mid-1990s was basically preserving a bunch of archived copies of the source tree. CVS was a thing but not on Windows, and SourceSafe was creating more problems than it been solving.
I kept regular tarballs of a project that used SourceSafe right near the start of my career, and found I was more likely to be able to find an intact copy of the right thing to diff against from my tarballs.
I think after a year or so I realised that even bothering to -try- to use SourceSafe was largely silly, got permission to stop, and installed a CVS server on a dev box for my own use.
(yes I know the VCS server shouldn't really be on the dev box I could potentially trash, I didn't have another machine handy and it was still a vast improvement)
I looked at a C++ codebase from 1997 at a previous job - I don't know much about the history but comments in one of the old files tracked dates and changes to 2001.
Not sure what happened after that but in 2017 someone copy-pasted the project from TFS to git and obliterated the prior history.
Legacy systems that you inherit don't have people coming with them very often. That's part of the context of this. You often don't have people to trace it back to or at least not the people who actually wrote it (maybe someone who worked with them before they got laid off a decade ago), and reformatting the code is not going to make it any harder to get answers from people who aren't there.
I've been in situations where even without access to the people knowing which of them wrote something gives me a better idea of how to backwards infer what (and of course sadly occasionally 'if') they were thinking while writing the code.
Then again, I think most of the tells for that for me are around the sort of structure that would survive reformatting anyway.
(and, y'know, legacy stuff, everything's a bloody trade-off)
I believe you can configure `git blame` to skip a specific commit. But in my experience it doesn't matter anyway for two reasons:
1. You're going to reformat it eventually anyway. You're just delaying things. The best time to plant a tree, etc.
2. If it's an old codebase and you're trying to understand some bit of code you're almost always going to have to walk through about 5 commits to get to the original one anyway. One extra formatting commit doesn't really make any difference.
This is another reason why you should track important information in comments alongside the code instead of trusting VCS to preserve it in logs/commit messages, and to reject weird code missing comments from being merged.
Not saying that fixes decades of cruft because you shouldn't change files without good reason and non-white space formatting is not a good reason, but I'm mentioning it because I've seen people naively belief bullshit like "code is self explanatory" and "the reason is in the commit message"
Just comment your code folks, this becomes less of a problem
How does reformatting trash the history? It's one extra commit..
I guess if it splits or combines lines that could cause some noise if you really want the history of a single line... But that happens all the time, and I don't see how it would really prevent understanding the history. You can always do a blame on a range of lines.
Maybe I'm missing something though, genuinely curious for a concrete example where reformatting makes it hard to understand history!
If you ask the IDE to show blame info next to each line, then a lot of lines will be from the big reformatting. If course you can dig in and retrieve the history still, but it's an extras step then. Btw, it seems that at least Git has a way to make `git blame` avoid considering certain commits (.Git attributes). Maybe that works in IDEs too!