Currently, if the parent(main) table and SI table don’t have the same valid
segments then we disable the SI table. And then from the next query onwards,
we scan and prune only the parent table until we trigger the next load or
REINDEX command (as these commands will make the parent and SI table
segments in sync). Because of this, queries take more time to give the
result when SI is disabled.
To solve this problem we are planning to support SI at the segment level. It
means we will not disable SI if the parent and SI table don’t have the same
segments, while we will do the pruning on Si for all valid segments, and for
the rest of the segments, we will do the pruning on main/parent table.
At the time of pruning with the main table in TableIndex.prune, if SI exists
for the corresponding filter then all segments which are not present in the
SI table will be pruned on the corresponding parent table segment.
Please let me know your thought and input about the same.
My thoughts as follows.
1. segment level's differences with table level
a) pushdown SI into CarbonDataSourceScan/Relation and avoid rewriting the
b) different segments will have different SI, so different segments maybe
choose the different SI
2. data loading/compaction/update/delete/merge
a) the main table can update tablestatus metadata entry to success status
before SI loading
b) if SI is disabled, no need to do SI loading; if SI is enabled, it can
do SI loading.
a) reading the data of SI table could be on the executor side; reading the
index of SI table could be on the driver side.
b) performance: now the system uses a distributed job (groupBy and Join
query) to collect the positionIDs of the result rows; if TableIndex.prune
use a single thread will have performance issue.
c) when the table has multiple SI tables, positionId join of table level
shoulde be converted to segment level join.
Thanks for bringing this up. It's an important feature to leverage SI at the
small segment level also.
Already a work is being done on making SI to prune at data map interface, so
your design should be aligned with that.
So better to check the SI as a data map design first and then make a design
for this, then it will be a clear picture to review and start the work, else
two designs will contradict each other.
As already a work is going on to support SI to prune as data map
interface (without SQL plan rewrite), This will be handled with help of some
carbon property and we are not going to remove the current design (SI
support with SQL plan rewrite).