[DISCUSSION] Display the segment ID when carbondata load is successful

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSSION] Display the segment ID when carbondata load is successful

Nihal
Hi all.!

Currently, with the help of show segment command, we can get the list of
segments with details like ID, Status, Load start time,
Load Time taken, partition, Data size, Index Size. And with the help of load
start time, we can know the segment id for a
particular load but in the case of concurrent load, it will be confusing to
know the segment id for the specific
load as load start time can be the same or nearby.

To come out with this problem we are planning to show the segment id with
the number of successful
entries in the segment when carbondata load is successful. We can include
some other details also if required after the conclusion.
With help of this, we can know the segment id corresponding to a particular
load and can be queried easily on that specific segment.

Note: This scenario is valid for* insert into *query also.

Please let me know your input about the same.

Thanks,
Nihal kumar ojha



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

vikramahuja1001
Hi Nihal

It's a good idea. We can also show the following information along with
segment number.
1. Segment size
2. Number of files in the segment
3. Cache size and location(index server in case of prepriming)

Can give the same information for add segment as well.
Others can give input as well.

Thanks
Vikram


On Wed, 13 Jan 2021, 12:09 pm Nihal, <[hidden email]> wrote:

> Hi all.!
>
> Currently, with the help of show segment command, we can get the list of
> segments with details like ID, Status, Load start time,
> Load Time taken, partition, Data size, Index Size. And with the help of
> load
> start time, we can know the segment id for a
> particular load but in the case of concurrent load, it will be confusing to
> know the segment id for the specific
> load as load start time can be the same or nearby.
>
> To come out with this problem we are planning to show the segment id with
> the number of successful
> entries in the segment when carbondata load is successful. We can include
> some other details also if required after the conclusion.
> With help of this, we can know the segment id corresponding to a particular
> load and can be queried easily on that specific segment.
>
> Note: This scenario is valid for* insert into *query also.
>
> Please let me know your input about the same.
>
> Thanks,
> Nihal kumar ojha
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

David CaiQiang
Hi Nihal, my suggestion as following,
1. contain the normal output of the show segment command
2. add more information for loading, like numFiles, numRows, rawDataSize
(maybe show segment need also, take care of CDC which needs to update this
information)



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

akashrn5
In reply to this post by Nihal
Hi Nihal,

The problem statement is not so clear, basically what is the use case, or in
which scenario thee problem is faced. Because we need to get the result from
the success segments itself. So please elaborate a little bit about the
problem.

Also, if you want to include more details, do not include in default show
segments, may be can include in show segments with query, which likun had
implemented. But this we can decide once its clear.

Also, @vikram showing cache here is not a good idea, as we already have a
command for that. If you are planning for segments wise, we can improve the
existing cache specific commands, lets not include here.

Thanks,

Regards,
Akash



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

Ajantha Bhat
Hi Nihal,
In concurrent scenario we cannot map which load command has been loaded as
which segment id.
It is good to show the summary at the end of command.


I agree with david suggestion.
Along with load and insert, if possible we should give summary for update,
delete and merge also (which we may start supporting concurrent operations
in near future)


Thanks,
Ajantha

On Mon, 18 Jan, 2021, 9:49 am akashrn5, <[hidden email]> wrote:

> Hi Nihal,
>
> The problem statement is not so clear, basically what is the use case, or
> in
> which scenario thee problem is faced. Because we need to get the result
> from
> the success segments itself. So please elaborate a little bit about the
> problem.
>
> Also, if you want to include more details, do not include in default show
> segments, may be can include in show segments with query, which likun had
> implemented. But this we can decide once its clear.
>
> Also, @vikram showing cache here is not a good idea, as we already have a
> command for that. If you are planning for segments wise, we can improve the
> existing cache specific commands, lets not include here.
>
> Thanks,
>
> Regards,
> Akash
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

Indhumathi
Hi Nihal,

I also feel, it is good to display only the segment Id at the end of
command, similar to
Update Command, which returns number of updated rows. No need to add other
details (It can
be enhanced in Show segments command if needed).

Regards,
Indhumathi M

On Mon, Jan 18, 2021 at 9:54 AM Ajantha Bhat <[hidden email]> wrote:

> Hi Nihal,
> In concurrent scenario we cannot map which load command has been loaded as
> which segment id.
> It is good to show the summary at the end of command.
>
>
> I agree with david suggestion.
> Along with load and insert, if possible we should give summary for update,
> delete and merge also (which we may start supporting concurrent operations
> in near future)
>
>
> Thanks,
> Ajantha
>
> On Mon, 18 Jan, 2021, 9:49 am akashrn5, <[hidden email]> wrote:
>
> > Hi Nihal,
> >
> > The problem statement is not so clear, basically what is the use case, or
> > in
> > which scenario thee problem is faced. Because we need to get the result
> > from
> > the success segments itself. So please elaborate a little bit about the
> > problem.
> >
> > Also, if you want to include more details, do not include in default show
> > segments, may be can include in show segments with query, which likun had
> > implemented. But this we can decide once its clear.
> >
> > Also, @vikram showing cache here is not a good idea, as we already have a
> > command for that. If you are planning for segments wise, we can improve
> the
> > existing cache specific commands, lets not include here.
> >
> > Thanks,
> >
> > Regards,
> > Akash
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

Yahui Liu
In reply to this post by Nihal
Hi all,

I agree to add some extra information after load/insert success. Shown
information should be only accessible during this load, other information
which can get at any time, no need to be shown on load return(we can add in
show segments command because we can run show segments command at any time).
I can use update command as example: update command will return "how many
row updated this time". This information we can never get after this update,
so this information is important.

So for load and insert command, my suggestion is following information:
1) segment id(of course)
2) how many row loaded/inserted this time(include the bad rows handled by
bad_record_action)
3) how many bad records(this information we can never get after
loading/inserting)
4) bad_record_location(this is also passed in load option, so maybe no need
because users set this themselves)



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

Nihal
In reply to this post by Nihal
Thank you all for your valuable inputs.

As per the suggestion and discussion, we have concluded to show only segment
Id as summary when load or insert command will be successful.

Regards,
Nihal kumar ojha



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

areyouokfreejoe
In reply to this post by Nihal
Hi,
let's continue to discuss about this. When auto merge is enable, should we
return the segment id before or after compaction?
My opinion is we should return the segment id before compaction because:
1. users will focus on his load operation, the merge operation is in backend
and the users may not feel it;
2. return segment id after compaction is impossible based on the code now,
because the load and the auto merge are asynchronous.




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

akashrn5
hi,

i think still the auto compaction after load is not async, plan is there to
make it async.
But according to me, we should give back current segment ID and if its
merged to some segment
we should say that , "X" is the segment ID loaded and its been merged to "Y"
segment, so that user can take decision whether to query that or not.

Because if we just give the current segment which will be in compacted state
and user blindly queries it and also if there is any concurrent clean files,
then operations will fail.
others can give their opinion.

Regards,
Akash



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

Yahui Liu
In reply to this post by areyouokfreejoe
Hi,

I think after load, only return the segment id which data is loaded to is
enough no matter auto load merge is enable or not. I will add one more
reason apart from @areyouokfreejoe metioned:
1. Because user alredy cares about each load, so mostly in their application
logic, auto load merge is disabled, user will hanlde compaction by
themselves. Auto load merge only base on segment no., not base on any
business relation between the segments. So if they enable auto load merge,
several segments which has no any relation just the segment_id is close will
be compacted. After this kind of compaction, all the information in the
segment before compaction will be lost, this is not what user wants. If any
load is special, in order to not lost any information after compaction, this
load should only merge with the segment which has the same special point
which is only known by the application, carbon currently has no place to
store this information. So only user can control which segments will be
compacted by trigger custom compaction with the segment ids which those
segments have the same special point.




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

kunalkapoor
+1, showing segment id should be enough as other information can be
gathered by other means.

On Thu, Feb 18, 2021 at 1:31 PM Yahui Liu <[hidden email]> wrote:

> Hi,
>
> I think after load, only return the segment id which data is loaded to is
> enough no matter auto load merge is enable or not. I will add one more
> reason apart from @areyouokfreejoe metioned:
> 1. Because user alredy cares about each load, so mostly in their
> application
> logic, auto load merge is disabled, user will hanlde compaction by
> themselves. Auto load merge only base on segment no., not base on any
> business relation between the segments. So if they enable auto load merge,
> several segments which has no any relation just the segment_id is close
> will
> be compacted. After this kind of compaction, all the information in the
> segment before compaction will be lost, this is not what user wants. If any
> load is special, in order to not lost any information after compaction,
> this
> load should only merge with the segment which has the same special point
> which is only known by the application, carbon currently has no place to
> store this information. So only user can control which segments will be
> compacted by trigger custom compaction with the segment ids which those
> segments have the same special point.
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

Indhumathi
+1, Agree with kunal, to show segment ID for current load.

Regards,
Indhumathi M



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

akashrn5
In reply to this post by Nihal
Hi,

+1,

Considering others opinions, just segment ID can be enough and users should
take care to check the status of it after load to decide whether to query or
go ahead with any other operation on that segment.

This makes code also simple and not induce any bugs and test scope will also
be very limited.

Regards,
Akash



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

VenuReddy
In reply to this post by Nihal

+1

Good idea. Agree with you.

Regards,
Venu



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

vikramahuja1001
+1 with Kunal’s idea

Vikram
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

Mahesh Raju Somalaraju
In reply to this post by Nihal
Hi,

+1, Agree with Kunal's idea that shows segment ID for successful
load/insert.

Thanks & Regards
Mahesh Raju Somalaraju

On Wed, Jan 13, 2021 at 12:09 PM Nihal <[hidden email]> wrote:

> Hi all.!
>
> Currently, with the help of show segment command, we can get the list of
> segments with details like ID, Status, Load start time,
> Load Time taken, partition, Data size, Index Size. And with the help of
> load
> start time, we can know the segment id for a
> particular load but in the case of concurrent load, it will be confusing to
> know the segment id for the specific
> load as load start time can be the same or nearby.
>
> To come out with this problem we are planning to show the segment id with
> the number of successful
> entries in the segment when carbondata load is successful. We can include
> some other details also if required after the conclusion.
> With help of this, we can know the segment id corresponding to a particular
> load and can be queried easily on that specific segment.
>
> Note: This scenario is valid for* insert into *query also.
>
> Please let me know your input about the same.
>
> Thanks,
> Nihal kumar ojha
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>