One table has 40K segments, now start to create SI on one column, it took 1
week and finished 39999 segments, but during creating on the last segment,
application crashed abnormally. Then checking the status of SI table, there
is no success segment. Start again to create SI, it will start from the
beginning(segment id = 0), all those SI segments which already loaded during
last creation need to reload again, which it is waste of time.
SI creation can succeed partitially, next time creation can start from the
point where last creation failed at.
yes, as you mentioned this is a major drawback in the current SI flow. This
problem exists because, when we get the set of segments to load, we start an
executor service and give all the segment list, after .get we make the
status success at once.
So we need to rewrite this code to make it like batch wise and avoid the
Yes, it seems to be a drawback in the SI creation command. As Akash
pointed out, instead of we trying to make status for all the segments at
once we can do 2 things:
1. Load in batches(similar to what akash mentioned) and in case of some
failure just stop loading and do not fail the SI creation command, so that
the user can use reindex command to repair the remaining segments or can
trigger repair in next consecutive loads in case of any failures.
2. Provide a way to only load some user defined number of segments in the
SI instead if loading all at once. In this case, let's say the user wants
to create a SI table with 40000 segments. He can just create a table with
some 500 or 1000 segments initially. The user can then fire reindex command
to load the remaining segments or can repair the remaining segments using
load command and can repair in batches.
> yes, as you mentioned this is a major drawback in the current SI flow. This
> problem exists because, when we get the set of segments to load, we start
> executor service and give all the segment list, after .get we make the
> status success at once.
> So we need to rewrite this code to make it like batch wise and avoid the
> Akash R
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >