This feature is great for compaction. I wonder do you observe more memory is used since it prefetch data in the memory? Do you have any number?
> 在 2018年11月7日，下午11:54，xuchuanyin <[hidden email]> 写道：
> Hi all:
> I am raising a PR to enhance the performance of compaction. The PR number is #2906.
> Based on my experiments using about 72GB LineItem data ( in 100GB TPCH data), I got the following results.
> Code Branch Prefetch Batch Size (default 100) Load1 (s) Load2 (s) Load3 (s) Compact 3 Loads (s) Time Reduced
> master NA 100 447.4 445.9 450.1 661.3 Base Line
> master NA 32000 441.5 454.4 456.8 641.2 +3.0%
> PR2906 enable 100 445.3 450.2 445.3 411.8 +37.7%
> PR2906 enable 32000 438.7 446.8 441.8 333.1 +49.6%
> PR2906 disable 100 458.1 459.4 450.9 659.5 +0.3%
> PR2906 disable 32000 472.0 446.8 457.1 654.5 +1.0%
> Note: These tests are under spark-2.2 version
> The results show that compaction performance is almost doubled if configured properly.
> It also shows even if this feature is disabled, the compaction performance still not decrease.
> So here:
> 1. I do want to make this feature ‘enabled’ by default.
> 2. Besides, I’d want the others in the community also test this feature and check whether we can benefit from this feature.
> Any feedback is welcome.
Oh, I didn't notice the memory consumption at that time.
We all know that the resource utilization is low during compaction.
Using prefetch means that We are doing query background and it will surely
consume more resources.
Current size of prefetch is controlled by the 'carbon.detail.batch.size' and
by default is 100 which means extra 100 rows will be kept in memory before
it is retrieved.
So the memory overhead consists the memory consumed by the query plus the
memory of the #carbon.detail.batch.size records.