You're right, there's no hard evidence. Google keeps its secrets well, making it difficult to prove these claims definitively. I'm speaking from experience, but there is a way to observe this phenomenon yourself if you have a website:
1. Register on Google Search Console
2. Go to the page indexing section
3. Look at the rows "Discovered - currently not indexed" and "Crawled - currently not indexed"
For example, my own site has a two-digit number of URLs in both categories. These are blog posts Google simply doesn't want to index for reasons unknown.
I have access to Google Search Console data for over 100 websites, and most/all of them have the same issues. This includes sites (like my own) that rank well for certain keywords and receive traffic.
1. Register on Google Search Console
2. Go to the page indexing section
3. Look at the rows "Discovered - currently not indexed" and "Crawled - currently not indexed"
For example, my own site has a two-digit number of URLs in both categories. These are blog posts Google simply doesn't want to index for reasons unknown.
I have access to Google Search Console data for over 100 websites, and most/all of them have the same issues. This includes sites (like my own) that rank well for certain keywords and receive traffic.