[Disscussion] Support High concurrency queries for hot data

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[Disscussion] Support High concurrency queries for hot data

haomarch
carbondata meets the performance requirements of point queries and complex
analysis in the same time. Before carbondata, users can only use hbase to
meet the requirement of point query, while using MPP or HiveonORC to run
complex analysis. With carbondata, they can use carbondata instead of hbase
and hiveonorc, which brings cost reduction.
But because of the low concurrency of carbondata, This benefit can only be
achieved on the low´╝Źconcurrency scenario, like maintenance´╝îsafety check. For
the online scenario and high-concurrency offline scenario, carbondata can't
do anything to help, which makes carbondata can't be applied to the more
valuable production env.

There is a suggestion:
We shall support high concurrency key-based queries(maybe 1000~10000 tps)
for the hot data(Recent one
month data). and low concurrency queries for the cold data(data of one month
ago).
Some design like HBase's RegionServer shall be used for reference;



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/