create SI can succeed partitially

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

create SI can succeed partitially

Yahui Liu
[Problem]
One table has 40K segments, now start to create SI on one column, it took 1
week and finished 39999 segments, but during creating on the last segment,
application crashed abnormally. Then checking the status of SI table, there
is no success segment. Start again to create SI, it will start from the
beginning(segment id = 0), all those SI segments which already loaded during
last creation need to reload again, which it is waste of time.

[Expectation]
SI creation can succeed partitially, next time creation can start from the
point where last creation failed at.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: create SI can succeed partitially

akashrn5
Hi,

yes, as you mentioned this is a major drawback in the current SI flow. This
problem exists because, when we get the set of segments to load, we start an
executor service and give all the segment list, after .get we make the
status success at once.

So we need to rewrite this code to make it like batch wise and avoid the
problem.


Regards,
Akash R



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: create SI can succeed partitially

vikramahuja1001
Hi,
Yes, it seems to be a drawback in the SI creation command.  As Akash
pointed out, instead of we trying to make status for all the segments at
once we can do 2 things:
1. Load in batches(similar to what akash mentioned) and in case of some
failure just stop loading and do not fail the SI creation command, so that
the user can use reindex command to repair the remaining segments or can
trigger repair in next consecutive loads in case of any failures.
2. Provide a way to only load some user defined number of segments in the
SI instead if loading all at once. In this case, let's say the user wants
to create a SI table with 40000 segments. He can just create a table with
some 500 or 1000 segments initially. The user can then fire reindex command
to load the remaining segments or can repair the remaining segments using
load command and can repair in batches.

Others can give their input as well.


Regards
Vikram

On Tue, Mar 2, 2021 at 4:00 PM akashrn5 <[hidden email]> wrote:

> Hi,
>
> yes, as you mentioned this is a major drawback in the current SI flow. This
> problem exists because, when we get the set of segments to load, we start
> an
> executor service and give all the segment list, after .get we make the
> status success at once.
>
> So we need to rewrite this code to make it like batch wise and avoid the
> problem.
>
>
> Regards,
> Akash R
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: create SI can succeed partitially

Mahesh Raju Somalaraju
In reply to this post by Yahui Liu
Hi,

+1 for the feature. we can save a lot of load time for the SI table.

doubt:
1) how we are going to handle the query for failed segments? need to prune
from maintainable directly?

Thanks & Regards
Mahesh Raju Somalaraju

On Fri, Feb 19, 2021 at 3:16 PM Yahui Liu <[hidden email]> wrote:

> [Problem]
> One table has 40K segments, now start to create SI on one column, it took 1
> week and finished 39999 segments, but during creating on the last segment,
> application crashed abnormally. Then checking the status of SI table, there
> is no success segment. Start again to create SI, it will start from the
> beginning(segment id = 0), all those SI segments which already loaded
> during
> last creation need to reload again, which it is waste of time.
>
> [Expectation]
> SI creation can succeed partitially, next time creation can start from the
> point where last creation failed at.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>