In practice it's more a marketing term, and how big is big depends on what the nature of the data is and what you're doing with it.
If it fits in RAM on your laptop, it isn't big data.
If you can't process/handle it in a reasonable time on a single machine and your methods need to explicitly worry about how to scale to handle the data volumes it probably is "Big Data".
Problems that are embarrassingly parallel need far more data before I'd consider them big (I'd be in the >10PB camp), whereas for relational data I'd say >1TB.
I've worked on relational databases of similar size. There's two challenges. The first is maintaining the relational model at that scale is quite tricky, tradeoffs need to be made. The second is the systems-level management of that large a deployment requires a bit more than standard configuration management.
These days Amazon have Multi-AZ RDS, which should handle the 2nd item.
The problem with databases that are 50tb or more is that you soon run into limits with the relational model. I have been reading up on different modeling techniques for converting relational models into cassandra's column family stores.
You can't practically fit 50TB on one machine and have reasonable performance, that means multiple machines with the data spread across them.
There's then two potential issues:
1) You're doing 1-to-1 joins across tables in a query, network latency may be an issue at high query rates
2) You're going 1-to-many or many-to-many joins across tables in a query, the resulting combinatorial explosion of data is too much to handle
You want to have your inner loops/joins as deep down in the stack as possible. If you can structure things so all the heavy lifting stays inside one rack/machine/NUMA node/ processor/core you'll be able to scale a good bit further further.
Designing things not to require joins, denormalising and putting it in a column store like Cassandra is also a good approach.
I think that needing disk parallelism because you have a workload that demands table scans and maintaining indexes is impractical due to dynamism in the data is one feature.
Another is not having pockets deep enough to solve it with intellectual property, either in the form of a parallel proprietary rdbs (spensive) or the need to implement clever stuff.
Big data as a technology is about dumb as brick, cheap as chips, brute force.
If it fits in RAM on your laptop, it isn't big data.
If you can't process/handle it in a reasonable time on a single machine and your methods need to explicitly worry about how to scale to handle the data volumes it probably is "Big Data".
Problems that are embarrassingly parallel need far more data before I'd consider them big (I'd be in the >10PB camp), whereas for relational data I'd say >1TB.