Re: Can I set a larger HDFS block size, like 4 or 8 GB in production environment? What is the problem with large blocks?
In theory, it should support
1. It may take long time to replicate in case any of the replica is
lost/moved due to balancer/mover/replication
2. In case of pipeline recoveries during write/append, if new node is
replaced the failed node, then existing data will be copied to new
datanode. This may take long time based on written size (in this case GBs).
If this transfer didn’t complete within timeout(default 60s) client may get
timeout, and write may fail.
3. Balancer may get timeout while moving blocks from one datanode to
another for balancing considering the size.
(remark : the reply is from HDFS PMC Vinayakumar )
> Hi All,
> I am wandering if I can use a very large block size in production HDFS
> cluster? Such as 4 or 8 gigabytes or even larger.
> Is there any problem with HDFS if there are a large number of large blocks
> in it?
> Then if the large blocks are stored as Carbondata or other columnar
> formats such as Orc or Parquet, and we want to execute queries on top of
> such data, what troubles we may have?