Hi all,
Currently, if the parent(main) table and SI table don’t have the same valid segments then we disable the SI table. And then from the next query onwards, we scan and prune only the parent table until we trigger the next load or REINDEX command (as these commands will make the parent and SI table segments in sync). Because of this, queries take more time to give the result when SI is disabled. To solve this problem we are planning to support SI at the segment level. It means we will not disable SI if the parent and SI table don’t have the same segments, while we will do the pruning on Si for all valid segments, and for the rest of the segments, we will do the pruning on main/parent table. At the time of pruning with the main table in TableIndex.prune, if SI exists for the corresponding filter then all segments which are not present in the SI table will be pruned on the corresponding parent table segment. Please let me know your thought and input about the same. Regards Nihal kumar ojha -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
hi Nihal,
My thoughts as follows. 1. segment level's differences with table level a) pushdown SI into CarbonDataSourceScan/Relation and avoid rewriting the SQL plan b) different segments will have different SI, so different segments maybe choose the different SI 2. data loading/compaction/update/delete/merge a) the main table can update tablestatus metadata entry to success status before SI loading b) if SI is disabled, no need to do SI loading; if SI is enabled, it can do SI loading. 3. query a) reading the data of SI table could be on the executor side; reading the index of SI table could be on the driver side. b) performance: now the system uses a distributed job (groupBy and Join query) to collect the positionIDs of the result rows; if TableIndex.prune use a single thread will have performance issue. c) when the table has multiple SI tables, positionId join of table level shoulde be converted to segment level join. ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
In reply to this post by Nihal
Hi Nihal,
Thanks for bringing this up. It's an important feature to leverage SI at the small segment level also. Already a work is being done on making SI to prune at data map interface, so your design should be aligned with that. So better to check the SI as a data map design first and then make a design for this, then it will be a clear picture to review and start the work, else two designs will contradict each other. Thanks, Regards, Akash R Nilugal -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Nihal
Hi,
Thanks for the input. As already a work is going on to support SI to prune as data map interface (without SQL plan rewrite), This will be handled with help of some carbon property and we are not going to remove the current design (SI support with SQL plan rewrite). So first we are focusing on leveraging SI to segment level with SQL plan rewrite. Please go through this design document <https://docs.google.com/document/d/1q1UIrMO4KGZuBICrixrv4JsbrblATSQVuYY0IAKxWn0/edit> and give your input or suggestion. https://docs.google.com/document/d/1q1UIrMO4KGZuBICrixrv4JsbrblATSQVuYY0IAKxWn0/edit Regards Nihal kumar ojha -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |