Apache CarbonData community is pleased to announce the release of the
Version 1.4.1 in The Apache Software Foundation (ASF).
CarbonData is a high-performance data solution that supports various data
analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter
lookups on detail record, streaming analytics, etc. CarbonData has been
deployed in many enterprise production environments, in one of the largest
scenarios, it supports queries on a single table with 3PB data (more than 5
trillion records) with response time less than 3 seconds!
This release note provides information on the new features, improvements,
and bug fixes of this release.
What’s New in Version 1.4.1?
In this version of CarbonData, more than 230 JIRA tickets for new feature,
improvement and bugs has been resolved. Following are the summary.
Carbon CoreSupport Cloud Storage (S3)
This can be used to store or retrieve data on Amazon cloud, Huawei
Cloud(OBS) or on any other object stores conforming to S3 API. Storing data
in cloud is advantageous as there are no restrictions on the size of data
and the data can be accessed from anywhere at any time. Carbondata can
support any Object Storage that conforms to Amazon S3 API. For more
detail, please refer to S3 Guide
Support Flat Folder
This feature allows all carbondata and index files to keep directly under
table-path. This is useful for interoperability between the execution
engines and plugin with other execution engines like Hive or Presto.
Support 32K Characters (Alpha Feature)
In common scenarios, the length of the string is less than 32000. In some
cases, if the length of the string is more than 32000 characters,
CarbonData introduces a table property called LONG_STRING_COLUMNS to handle
this scenario. For these columns, CarbonData internally stores the length
of content using Integer.
Helps in getting more compression. Filter queries and full scan queries
will be faster as filter will be done on encoded data. Reducing the store
size and memory footprint as only unique values will be stored as part of
local dictionary and corresponding data will be stored as encoded
data. Getting higher IO throughput.
CarbonData supports merging of all the index files inside a segment to a
single CarbonData index merge file. This enhances the first query
Shows History Segments
CarbonData introduces a 'SHOW HISTORY SEGMENTS' to show all segment
information including visible and invisible segments.
Custom compaction is a new compaction type in addition to MAJOR and MINOR
compaction. In custom compaction, you can directly specify the segment ids
to be merged.
Enhancement for Detail Record AnalysisSupports Bloom Filter DataMap
CarbonData introduces BloomFilter as an index datamap to enhance the
performance of querying with precise value. It is well suitable for queries
that do precise match on high cardinality columns(such as Name/ID). In
concurrent filter query scenario (on high cardinality column), we observe
3~5 times improvement in concurrent queries per second comparing to last
version. For more detail, please refer to BloomFilter DataMap Guide
Improved Complex Datatypes
Improved complex datatypes compression and performance through adaptive