The current implementation of S3 support has a few limitations which are
Currently while writing a file onto HDFS a lock is acquired to ensure
synchronisation which is not feasible in case of S3 as it does not have any
lease(only one user can write at a time).
Introduce Memory lock which can take care of the above problem.
*Problem(Write with append mode):*
Everytime a thrift related file stream is opened append mode is used which
is not supported in S3. Therefore while writing index files in append mode
the existing file is read into memory and then rewritten with overwrite as
true and then new content is written to the file.
*Change the current implementation of ThriftWriter(for s3) to collect the
contents of index file in a buffer add the new content and overwrite the
whole file at once.*
In case of rename currently table path is also being updated with the new
table name. But S3 does not support force rename and the rename is copying
files onto new path which can be a very time consuming task therefore the
current implementation can be changed as follows:
- Rename the table in metadata without altering the table path(table
path will not be updated with the new table name).
- If user tries to create table with old table name then create the path
with UUID appended to the table name.
For example table name is table1 and table path is store/db1/table1. When
renaming to table2 the table name in metadata will be update to table2 but
the path will remain the same. If user tries to create a new table with
table1 name then the table path would be table1-<UUID>.
Pre-aggregate transaction support is relying heavily on renaming the table
status file as follows:
- Write the main table segment as In-progress in tablestatus file.
- Write the aggregate table segment as In-progress in tablestatus file.
- When load for aggregate table completes write the Success segment into
a new table status file with the name tablestatus-UUID.
- When the load for all aggregate tables are complete start renaming the
tablestatus file to tablestatus_backup_UUID and rename tablestatus-UUID to
tablestatus. remove all files with _backup_UUID once done. If everything is
Success then change the segment status to Success for main table. If
anything fails then use the _backup_UUID to restore the aggregate table to
restore to old state.
If we use DB to store table status of the aggregate table on S3 then this
problem will not come as the DB can ensure transactional behaviour while
Any suggestion from community is most welcomed. Please let me know for any
1. Memory lock cannot support multiple drivers. Documentation will be
updated with this limitation.
2. I agree that in case of failure reverting the changes is necessary. Will
take care of this point.
3. You are right refresh using table name would not work. I think we can
introduce refresh using path for this scenario.
On Fri, Jun 22, 2018 at 12:08 PM David CaiQiang <[hidden email]>
> Hi Kunal,
> I have some questions.
> Does the memory lock support that the multiple drivers concurrently load
> data to the same table? maybe it should note this limitation.
> *Problem(Write with append mode):*
> 1. atomicity
> After the overwrite operation failed, maybe the old file is destroyed. It
> should be able to recover the old file.
> *Problem(Alter rename):*
> If the table folder is different with the table name, maybe "refresh
> table" command should be enhanced.
> Best Regards
> David Cai
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >