On the flip side, auto-formatting will trash your version history and impede ana...

Jtsummers · on Feb 29, 2024

I'm not hardcore on auto-formatters, but I think their impact on code history is negligible in the case of every legacy system I've worked on. The code history just isn't there. These aren't projects that used git until recently (if at all). Before that they used something else, but when they transitioned they didn't preserve the history. And that's if they used any version control system. I've tried to help teams whose idea of version control was emailing someone (they termed them "QA/CM") to make a read-only backup of the source directory every few months (usually at a critical review period in the project, so a lot of code was changed between these snapshots).

That said, sure, skip them if you're worried about the history getting messed up or use them more selectively.

KerrAvon · on Feb 29, 2024

SVN was a thing by the mid-2000's, and history from that is easy to preserve in git. Just how old are the sourcebases in question? (Not to shoot the messenger; just like, wow.)

edit:typo

olvy0 · on March 1, 2024

I maintain a C++ codebase that was originally written in 1996, and is mission critical for my organization. Originally maintained in Visual Sourcesafe, then in TFS source control, and now git. Some parts of it were rewritten (several times) in C#, but the core is still C++.

I was very worried when we transitioned to git that history will not be preserved and tried to preserve it, but it proved too much hassle so I dropped it.

In fact that proved not to be a problem. Well, not a problem for me, since I remember all the history of the code and all the half forgotten half baked features and why they are there. But if I'm gone then yes, it's going to be a problem. It's in a dire need for a rewrite, but this has been postponed again and again.

varjag · on Feb 29, 2024

The first large C++ project I worked on in mid-1990s was basically preserving a bunch of archived copies of the source tree. CVS was a thing but not on Windows, and SourceSafe was creating more problems than it been solving.

mst · on March 1, 2024

I kept regular tarballs of a project that used SourceSafe right near the start of my career, and found I was more likely to be able to find an intact copy of the right thing to diff against from my tarballs.

I think after a year or so I realised that even bothering to -try- to use SourceSafe was largely silly, got permission to stop, and installed a CVS server on a dev box for my own use.

(yes I know the VCS server shouldn't really be on the dev box I could potentially trash, I didn't have another machine handy and it was still a vast improvement)

Jtsummers · on Feb 29, 2024

Some of these systems dated back to the 1970s. The worst offenders were from the 1980s and 1990s though.

It's all about the team or organization and their laziness or non-laziness.

pyuser583 · on March 1, 2024

I've had issues doing decent copies from SVN to GIT. They both have different ideas about user identity, and how fragmented it can be.

jamesfinlayson · on March 1, 2024

I looked at a C++ codebase from 1997 at a previous job - I don't know much about the history but comments in one of the old files tracked dates and changes to 2001. Not sure what happened after that but in 2017 someone copy-pasted the project from TFS to git and obliterated the prior history.

Pfiffer · on Feb 29, 2024

I've heard a lot of stories about mid-90s codebases for sure

bear8642 · on Feb 29, 2024

>I think their impact on code history is negligible in the case of every legacy system I've worked on. The code history just isn't there.

Not sure if I agree here or not - whilst yes, the history isn't there, if it's a small enough team you'll have a good guess at who wrote it.

Definitely found I've learnt the style of colleages so know who to ask just from the code outline.

Jtsummers · on Feb 29, 2024

Legacy systems that you inherit don't have people coming with them very often. That's part of the context of this. You often don't have people to trace it back to or at least not the people who actually wrote it (maybe someone who worked with them before they got laid off a decade ago), and reformatting the code is not going to make it any harder to get answers from people who aren't there.

mst · on March 1, 2024

I've been in situations where even without access to the people knowing which of them wrote something gives me a better idea of how to backwards infer what (and of course sadly occasionally 'if') they were thinking while writing the code.

Then again, I think most of the tells for that for me are around the sort of structure that would survive reformatting anyway.

(and, y'know, legacy stuff, everything's a bloody trade-off)

skrebbel · on Feb 29, 2024

You can ignore commits from git blame by adding them to a .gitattributes file.

This is assuming Git of course, which is not a given at all for the average legacy c++ codebase.

fransje26 · on March 1, 2024

Good to know. Thanks for the tip!

lpapez · on Feb 29, 2024

You can instruct git to ignore specific commits for blame and diff commands.

See "git blame ignore revs file".

Intended use is exactly to ignore bulk changes like auto formatting.

westurner · on Feb 29, 2024

+1

  man git-blame
  git help blame

https://git-scm.com/docs/git-blame

IshKebab · on Feb 29, 2024

I believe you can configure `git blame` to skip a specific commit. But in my experience it doesn't matter anyway for two reasons:

1. You're going to reformat it eventually anyway. You're just delaying things. The best time to plant a tree, etc.

2. If it's an old codebase and you're trying to understand some bit of code you're almost always going to have to walk through about 5 commits to get to the original one anyway. One extra formatting commit doesn't really make any difference.

duped · on Feb 29, 2024

This is another reason why you should track important information in comments alongside the code instead of trusting VCS to preserve it in logs/commit messages, and to reject weird code missing comments from being merged.

Not saying that fixes decades of cruft because you shouldn't change files without good reason and non-white space formatting is not a good reason, but I'm mentioning it because I've seen people naively belief bullshit like "code is self explanatory" and "the reason is in the commit message"

Just comment your code folks, this becomes less of a problem

mb7733 · on March 1, 2024

How does reformatting trash the history? It's one extra commit..

I guess if it splits or combines lines that could cause some noise if you really want the history of a single line... But that happens all the time, and I don't see how it would really prevent understanding the history. You can always do a blame on a range of lines.

Maybe I'm missing something though, genuinely curious for a concrete example where reformatting makes it hard to understand history!

samus · on March 1, 2024

If you ask the IDE to show blame info next to each line, then a lot of lines will be from the big reformatting. If course you can dig in and retrieve the history still, but it's an extras step then. Btw, it seems that at least Git has a way to make `git blame` avoid considering certain commits (.Git attributes). Maybe that works in IDEs too!

PreachSoup · on Feb 29, 2024

On per file level it's just 1 commit. It's not really a big deal

exDM69 · on Feb 29, 2024

clang-format can be applied to new changes only, for this very reason.

Adding it will remove white space nitpicking from code review, even if it isn't perfect.