You make a good point, but there also could be more to it than that:
- Need to make sure the photographers are careful not to damage fragile pages
- Need a system of organization (syncing ten thousand default-named iphone pics with no labels is not ideal)
- You might be ignoring important differences between modern published books on your bookshelf and these materials (ex. maybe font is not same size, maybe font is not modern English, maybe characters are not printed consistently, maybe pages are dirty, all of which could impact OCR-friendliness of an iphone pic compared to something else
- There might even be valuable information in markings below the topmost visible layer which could be revealed by scanning equipment (especially for example if pages are stuck together)
And that's just off the top of my head, without real domain knowledge.
It's not about OCR or dirt. It's about taking an image. I doubt OCR would work on any of them, whether you use a $$$$$ archivist to photograph the pages or not.
As for below the topmost layer, you're right, an iphone camera won't do it. But worrying about that comes much, much later.
- Need to make sure the photographers are careful not to damage fragile pages
- Need a system of organization (syncing ten thousand default-named iphone pics with no labels is not ideal)
- You might be ignoring important differences between modern published books on your bookshelf and these materials (ex. maybe font is not same size, maybe font is not modern English, maybe characters are not printed consistently, maybe pages are dirty, all of which could impact OCR-friendliness of an iphone pic compared to something else
- There might even be valuable information in markings below the topmost visible layer which could be revealed by scanning equipment (especially for example if pages are stuck together)
And that's just off the top of my head, without real domain knowledge.