Not quite TB's daily, but close. We were in B2B lead generation, and a lot of my ETL workloads involved heavy text normalization and standardization, then source layering to ultimately stitch together as complete and accurate of a record as possible based on the heuristics we had available.
Providers of that type of data essentially live in a world of "dump dataset to csv[1] periodically, place csv onto the FTP account for whomever is paying us for it currently". No deltas for changed or net new records, no per customer formatting requests, nothing. So the entire thing had to be re-processed every single time from every single vendor and then upserted into our master data.
[1] Hell, usually not even technical information was provided like the character encoding the data was stored or exported at or whether it's using database style character escapes (any potential special character is escaped with a backslash) or csv-style escapes (everything is interpreted as a literal except for a double-quote, which is escaped with a second double-quote).
Providers of that type of data essentially live in a world of "dump dataset to csv[1] periodically, place csv onto the FTP account for whomever is paying us for it currently". No deltas for changed or net new records, no per customer formatting requests, nothing. So the entire thing had to be re-processed every single time from every single vendor and then upserted into our master data.
[1] Hell, usually not even technical information was provided like the character encoding the data was stored or exported at or whether it's using database style character escapes (any potential special character is escaped with a backslash) or csv-style escapes (everything is interpreted as a literal except for a double-quote, which is escaped with a second double-quote).