Hadoop is designed as a batch processing framework not a real-time analysis framework. If your analysis functions are idempotent with regard to future data in the time domain simply compute summaries for each "block" down to the resolution supported for that age of data such that your summaries fit in memory. Save ALL the data to hadoop in case you need to replay it later. The answer to your question depends very much on whether you can summarize your data in the time domain. eg. if computing an average store block summaries as the average AND the number of items so that future summaries can be easily integrated. There is no one answer that will solve any possible analysis function, you'll need to optimize your system around the analysis function you want to perform and perhaps have a few different systems purpose built for different types of analysis.
Let's consider a particular problem domain: analysis of global financial data - fixed income, stocks, derivatives, etc.
Agreed, Hadoop is a batch processing framework across a chunked archive. Work has been done recently to bring Hadoop "out of the past" - http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-13.... However, even with these latest amendments, the latency can prove more than a little troublesome for trading strategies which require rapid execution.
It is for this reason that companies like Truviso and StreamBase - both born out of highbrow academic research - have built in-memory stream-processing frameworks in addition to persistent data stores.
If we assume, for the sake of argument, that analysis of historical data is important, and that Hadoop is fit for purpose. And if we assume also, for the sake of argument, that a distributed in-memory processing facility is also important. Then which in-memory solution ought we to employ, and how ought we to relate this to the Hadoop solution which we'll also be using?
you should try a few of them and see which one actually works better. Probably more importantly, you should figure out more specifically what you're trying to accomplish.
I believe you mean "invariant with regard to future data in the time domain", i.e. f(x_1,..,x_t) == f(x_1,..,x_t,x_{t+1})
"Idempotent" means that f(f(x)) == f(x), which wouldn't apply unless the output of a given analysis had to be fed back into it. Most outputs are going to be tables of counts with the input being raw text data, so that wouldn't apply in this case.