Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can actually identify clusters of websites based on the cosine similarity of their outbound links. Pretty useful for identifying content farms spanning multiple websites.

Have a lil' data explorer for this: https://explore2.marginalia.nu/

Quite a lot of dead links in the dataset, but it's still useful.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: