"Another that I don't think has even been explored yet is what I would probably call "textual fingerprinting" meaning that the more you write in one post the more likely you can extract some identifiable style, choice of words and so on and that can be linked to other things you've written."
Forensic textual analysis predates the internet, actually, and I can say with near certainty that someone out there is working on this right now. (Probably the CIA, and/or another agency involved in analyzing communications between persons of interest).
Back in undergrad, I was involved in a project that used textual analysis to surmise authorship of "anonymous" works of fiction from Shakespeare's day. Additionally, the authorship of a lot of works attributed to Shakespeare is often called into question, and so we tried to map for consistency or inconsistency in his works. (Spoiler alert: he turned up pretty clean; he -- or at least someone using his name -- is most likely the author of his works).
Now, obviously, this project was a bit high-touch and involved a lot of attention lavished on a single body of work by a handful of presumed authors. But that was a long time ago, and I'm sure that the technology has grown more scalable and sophisticated by now.
Forensic textual analysis predates the internet, actually, and I can say with near certainty that someone out there is working on this right now.
Plagiarism detection software is increasingly common in education. I've actually been looking for something similar to filter news commentary systems, to get a sense of what %age of comments on political stories are actually astroturf.
You know, I've just recently started studying law, so I really don't have the time to work up the required scraping/webdev skills from scratch. In fact, I don't have as much time to post on HN either, and should probably be posting less :-) But I would love to collaborate on this part-time if there's anyone in the Bay Area for whom scraping and hashing would be trivial tasks. Same name at gee mail.
Forensic textual analysis predates the internet, actually, and I can say with near certainty that someone out there is working on this right now. (Probably the CIA, and/or another agency involved in analyzing communications between persons of interest).
Back in undergrad, I was involved in a project that used textual analysis to surmise authorship of "anonymous" works of fiction from Shakespeare's day. Additionally, the authorship of a lot of works attributed to Shakespeare is often called into question, and so we tried to map for consistency or inconsistency in his works. (Spoiler alert: he turned up pretty clean; he -- or at least someone using his name -- is most likely the author of his works).
Now, obviously, this project was a bit high-touch and involved a lot of attention lavished on a single body of work by a handful of presumed authors. But that was a long time ago, and I'm sure that the technology has grown more scalable and sophisticated by now.