[Disscussion] Support High concurrency queries for hot data
carbondata meets the performance requirements of point queries and complex
analysis in the same time. Before carbondata, users can only use hbase to
meet the requirement of point query, while using MPP or HiveonORC to run
complex analysis. With carbondata, they can use carbondata instead of hbase
and hiveonorc, which brings cost reduction.
But because of the low concurrency of carbondata, This benefit can only be
achieved on the low－concurrency scenario, like maintenance，safety check. For
the online scenario and high-concurrency offline scenario, carbondata can't
do anything to help, which makes carbondata can't be applied to the more
valuable production env.
There is a suggestion:
We shall support high concurrency key-based queries(maybe 1000~10000 tps)
for the hot data(Recent one
month data). and low concurrency queries for the cold data(data of one month
Some design like HBase's RegionServer shall be used for reference;