I'm surprised the authors didn't try the "dumb" version of the solution they wen...

I'm surprised the authors didn't try the "dumb" version of the solution they went with: instead of using fancy cosine similarity create to implicit clusters, just ask it to classify the comment along a few dimensions and then do your own filtering on that (or create your own 0-100 scoring!) Seems like you would have more control that way and actually derive some rich(er) data to fine tune on. It seems they are already almost doing this: all the examples in the article start with "style"!

I have seen this pattern a few times actually, where you want the AI to mimic some heuristic humans use. You never want to ask it for the heuristic directly, just create the constitute data so you can do some simple regression or whatever on top of it and control the cutoff yourself.