Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it's much more helpful to have an idea of what you want to get out of your dataset. Generating random data can be useful for stress testing systems, but without an idea of what you want you'll have no idea whether what your exercising is useful or not.


Random data or mashups of public datasets are good for learning the mechanics of specific processing frameworks, but you really need a clear objective guiding the analysis to understand the concepts behind processing big data.

Random data is god for the how and the with what (to an extent), but not for the when and the why.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: