There is a stream of numbers coming. At any point of time i might need 10% random numbers. I obviously don't want to store the entire stream...
say they all are the time spent by customers in restraunt. So i want to query what is the avg time spent in the restraunt between a particular time interval and as my normal table is pretty large , i want to do it over sampled data. The problem of grabbing 10% at a time is if i always receive say less then 10 message in batch , then 10% will always be null
Subscribe to:
Post Comments (Atom)
Featured Articles
Trending news
-
Gaston Hillar provides a good tutorial on Pydoop which is a Python API for Hadoop. Pydoop Script enables you to write simple MapRedu...
-
The iconic Java based logging utility called Log4j announced end of life for Version 1.x. O riginally written by Ceki Gülcü, this logging ...
-
Databricks, one of the main commercial companies behind Apache Spark, has appointed Ali Ghodsi as CEO and Ion Stoica has moved up to t...
-
There has been a generous dose of reactions to HadoopSphere Top Big Data Influencers list for 2015 . Reactions have ranged from cheer a...
-
The hot open source streaming and batch processing system Apache Flink has made a major release milestone by announcing version 1.0 and ...
comments: