[DISCUSSION] Segment file improvement for Update and delete case
Consider a scenario where we have one update operation done on segment, so
one index and data file are generated. Now one more update operation
happens which will load the segments
of old update to cache, and actual indexmerge of that segment to cache. But
since we have horizontal compaction for update and delete, new index file
generated after horizontal compaction,
along with compacted files will also be loaded into cache for any next
query on table. This is because, even though the files are invalid as they
are compacted, their status is still as success in
So whenever the query comes on table, those will be loaded into cache even
though they are invalid (Horizontally Compacted). This is wrong, and these
persist inside cache
until we drop cache. These will be loaded again if we do query. This will
be avoided only when we run clean files on table. Clean files will clear
the horizontally compacted files inside
the segment, and update the segment file with valid ones. But by this time,
if we have done the query before clean files, then even after deleting the
horizontally compacted files,
those are present in cache, it may lead to query failure.
There can be two solutions,
1. Either maintain the status of horizontally compacted files inside the
segment file, so as to avoid considering these files during query and clear
cache after update operation for that query.
2. or, delete the horizontally compacted files after the horizontal
compaction and clear the segment cache for that segment.
With the proper solution, we can even avoid the operations we are doing
based on the timestamps for IUD files in case of clean files.
Better to refactor in a proper way.
Any inputs or any improvement or suggestions are most welcome.