[DISCUSS] Support transactional table in SDK

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Support transactional table in SDK

Jacky Li

Hi All,

In order to support application integration without central coordinator like Flink and Kafka Stream, transaction table need to be supported in SDK, and a new type of segment called Online Segment is proposed.

Since it is hard to describe the motivation and design in a good format in the mail, I have attached a document in CARBONDATA-3152. Please review the doc and provide your feedback.  

https://issues.apache.org/jira/browse/CARBONDATA-3152

Regards,
Jacky



Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support transactional table in SDK

ravipesala
Hi Jacky,

Its a good idea to support writing transactional table from SDK. But we need
to add following limitations as well
 1. It can work on file systems which can take append lock like HDFS.
 2. Compaction, delete segment cannot be done on online segments till it is
converted to the transactional segment.
 3. SDK writer should be responsible to add complete carbondata file to
online segment once the writing is done, it should not add any half cooked
data.
 
And also as we are trying to updating the tablestatus from other modules
like SDK , we better consider the segment interface first. Please go through
the jira
https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-2827


Regards,
Ravindra





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support transactional table in SDK

Liang Chen
Administrator
In reply to this post by Jacky Li
Hi

Good idea, thank you started this discussion.

Agree with Ravi comments, we need to double-check some limitations after
introducing the feature.

Flink and Kafka integration can be discussed later.
For using SDK to write new data to the existing carbondata table , some
questions:
1.How to ensure to create the same index, dictionary... policy as per the
existing table?
2.Can you please help me to understand this proposal further : what valued
scenarios require this feature?

------------------------------------------------------------------------------------------------
After having online segment, one can use this feature to implement
ApacheFlink-CarbonData integration, or Apache
KafkaStream-CarbonDataintegration, or just using SDK to write new data to
existing CarbonData table,the integration level can be the same as current
Spark-CarbonDataintegration.

Regards
Liang



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support transactional table in SDK

Nicholas
In reply to this post by Jacky Li
Hi Jacky,
Carbon should support transactional table in SDK before
ApacheFlink-Carbondata Integration.After having online segment, I can use
this feature to implement ApacheFlink-CarbonData integration.Therefore, can
I participate in the development of this feature,facilitating the
integration of ApacheFlink-CarbonData integration feature?



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support transactional table in SDK

Jacky Li
In reply to this post by ravipesala


> 在 2018年12月7日,下午11:05,ravipesala <[hidden email]> 写道:
>
> Hi Jacky,
>
> Its a good idea to support writing transactional table from SDK. But we need
> to add following limitations as well
> 1. It can work on file systems which can take append lock like HDFS.
Likun: yes, since we need to overwrite table status file, we need file locking.

> 2. Compaction, delete segment cannot be done on online segments till it is
> converted to the transactional segment.
Likun: Compaction and other data management work will still be done by CarbonSession application in standard spark cluster.

> 3. SDK writer should be responsible to add complete carbondata file to
> online segment once the writing is done, it should not add any half cooked
> data.
Likun: yes, in the design doc, I have mentioned this

>
> And also as we are trying to updating the tablestatus from other modules
> like SDK , we better consider the segment interface first. Please go through
> the jira
> https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-2827
>
>
> Regards,
> Ravindra
>
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support transactional table in SDK

Jacky Li
In reply to this post by Liang Chen


> 在 2018年12月8日,下午3:53,Liang Chen <[hidden email]> 写道:
>
> Hi
>
> Good idea, thank you started this discussion.
>
> Agree with Ravi comments, we need to double-check some limitations after
> introducing the feature.
>
> Flink and Kafka integration can be discussed later.
> For using SDK to write new data to the existing carbondata table , some
> questions:
> 1.How to ensure to create the same index, dictionary... policy as per the
> existing table?
Likun: SDK uses the same writer provided in carbondata-core module, so it follows the same “policy” as you mentioned

> 2.Can you please help me to understand this proposal further : what valued
> scenarios require this feature?
Likun: currently, SDK writes carbondata files in a flat folder and lose all features built on top on segment concept, such as show segment, delete segment, compaction, datamap, MV, data update, delete, streaming, global dictionary, etc.
By introducing this feature (support transactional table in SDK), application can use it in a non-spark environment to write new carbondata files and still enjoy transactional table with segment support and all previous features supported.

Basically, these new APIs in SDK adds a new way to write data into an existing carbondata table. It is for non-spark environment such as Flink, Kafka-Stream, Cassandra, or any other Java application.  

>
> ------------------------------------------------------------------------------------------------
> After having online segment, one can use this feature to implement
> ApacheFlink-CarbonData integration, or Apache
> KafkaStream-CarbonDataintegration, or just using SDK to write new data to
> existing CarbonData table,the integration level can be the same as current
> Spark-CarbonDataintegration.
>
> Regards
> Liang
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>



Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support transactional table in SDK

Jacky Li
In reply to this post by Nicholas
Hi Nicholas,

Yes, this is a feature required for flink-carbon to write to transactional table. You are welcomed to participate in this. I think you can contribute by reviewing the design doc in CARBONDATA-3152 firstly, after we settle down the API we can open sub-tasks for this ticket.


Regards,
Jacky

> 在 2018年12月10日,下午1:55,Nicholas <[hidden email]> 写道:
>
> Hi Jacky,
> Carbon should support transactional table in SDK before
> ApacheFlink-Carbondata Integration.After having online segment, I can use
> this feature to implement ApacheFlink-CarbonData integration.Therefore, can
> I participate in the development of this feature,facilitating the
> integration of ApacheFlink-CarbonData integration feature?
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>