Quantcast

White Space does not trim in CarbonData

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

White Space does not trim in CarbonData

vin7149
This post was updated on .
Hi Community,

I have created a table in hive and same in CarbonData, In my CSV there is whitespace after the data which is working fine in hive table whereas it shows null in CarbonData table.

In Hive

1: Table created in Hive

create table my_user(id int, name string, address1 string, address2 string,salary double, incentive int)ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";

2: Load Data in table

LOAD DATA LOCAL INPATH '/home/vinod/Downloads/my_user2.csv' OVERWRITE INTO TABLE my_user;

3: Select Query on hive

0: jdbc:hive2://localhost:10000> select * from my_user;
+-----+-----------+-----------+-----------+----------+------------+--+
| id  |   name    | address1  | address2  |  salary  | incentive  |
+-----+-----------+-----------+-----------+----------+------------+--+
| 1   | 'katy'    | 'london'  | 'UK'      | 5000.0   | 5000       |
| 2   | 'John'    | 'goa'     | 'goa'     | 12000.0  | 500        |
| 3   | 'Rachel'  | 'goa'     | 'goa'     | 10000.0  | 1000       |
+-----+-----------+-----------+-----------+----------+------------+--+
3 rows selected (0.122 seconds)

In Carbon

1: Create Table in Carbon

create table my_user_c(id int, name string, address1 string, address2 string,salary double, incentive int) stored by 'carbondata';

2: Load Data in table

load data inpath 'hdfs://localhost:54310/my_user2.csv' into table my_user_c;

3: Select Query on carbon

0: jdbc:hive2://localhost:10000> select * from my_user_c;
+-----+-----------+-----------+-----------+----------+------------+--+
| id  |   name    | address1  | address2  |  salary  | incentive  |
+-----+-----------+-----------+-----------+----------+------------+--+
| 2   | 'John'    | 'goa'     | 'goa'     | 12000.0  | 500        |
| 3   | 'Rachel'  | 'goa'     | 'goa'     | 10000.0  | NULL      |
| 1   | 'katy'    | 'london'  | 'UK'      | 5000.0   | 5000       |
+-----+-----------+-----------+-----------+----------+------------+--+
3 rows selected (0.193 seconds)


it should display 1000 instead of NULL value in CarbonData.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: White Space does not trim in CarbonData

David CaiQiang
I suggest to trim white space before converting a string value to a numeric value.
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: White Space does not trim in CarbonData

manishgupta88
Hi Vin,

Value is getting displayed as null because data is not getting trimmed. By
default carbondata will not modify any data. But there is option to verify
your data.

1. While loading the data set the property "
BAD_RECORDS_LOGGER_ENABLE=true". This will create a bad record logger where
in you can see what all records are invalid.

2. Secondly by default trimming data is disabled in the code.

parserSettings.setIgnoreLeadingWhitespaces(false);
parserSettings.setIgnoreTrailingWhitespaces(false);


We can provide an option during data load to trim whitespaces and keep the
default value as false to keep the current behavior intact. Example
load data inpath 'D:/my_user2.csv' into table my_user_c
OPTIONS('TRIM_WHITESPACE'='TRUE')

@Dev team please share your inputs for the 2nd option.

Regards
Manish Gupta


On Wed, May 17, 2017 at 7:47 AM, David Cai <[hidden email]> wrote:

> I suggest to trim white space before converting a string value to a numeric
> value.
>
>
>
> -----
> Best Regards
> David Cai
> --
> View this message in context: http://apache-carbondata-dev-
> mailing-list-archive.1130556.n5.nabble.com/White-Space-
> does-not-trim-in-CarbonData-tp12736p12751.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: White Space does not trim in CarbonData

Jacky Li
Hi,

Generally speaking, I think it is preferable to make system more tolerant to input data so that it is simpler to use from user point of view. And I think it is safe to align with other format like ORC and parquet.

Regards,
Jacky

> 在 2017年5月17日,下午12:30,manish gupta <[hidden email]> 写道:
>
> Hi Vin,
>
> Value is getting displayed as null because data is not getting trimmed. By
> default carbondata will not modify any data. But there is option to verify
> your data.
>
> 1. While loading the data set the property "
> BAD_RECORDS_LOGGER_ENABLE=true". This will create a bad record logger where
> in you can see what all records are invalid.
>
> 2. Secondly by default trimming data is disabled in the code.
>
> parserSettings.setIgnoreLeadingWhitespaces(false);
> parserSettings.setIgnoreTrailingWhitespaces(false);
>
>
> We can provide an option during data load to trim whitespaces and keep the
> default value as false to keep the current behavior intact. Example
> load data inpath 'D:/my_user2.csv' into table my_user_c
> OPTIONS('TRIM_WHITESPACE'='TRUE')
>
> @Dev team please share your inputs for the 2nd option.
>
> Regards
> Manish Gupta
>
>
> On Wed, May 17, 2017 at 7:47 AM, David Cai <[hidden email]> wrote:
>
>> I suggest to trim white space before converting a string value to a numeric
>> value.
>>
>>
>>
>> -----
>> Best Regards
>> David Cai
>> --
>> View this message in context: http://apache-carbondata-dev-
>> mailing-list-archive.1130556.n5.nabble.com/White-Space-
>> does-not-trim-in-CarbonData-tp12736p12751.html
>> Sent from the Apache CarbonData Dev Mailing List archive mailing list
>> archive at Nabble.com.
>>



Loading...