[GitHub] [carbondata] VenuReddy2103 opened a new pull request #4110: [WIP]Secondary Index as a coarse grain datamap

classic Classic list List threaded Threaded
83 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 opened a new pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox

VenuReddy2103 opened a new pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110


    ### Why is this PR needed?
   At present, secondary indexes are leveraged for query pruning via spark plan modification. This approach is tightly coupled with spark because the plan modification is specific to spark engine. In order to use secondary indexes for Presto or Hive queries, it is not feasible to modify the query plans as we desire in the current approach. Thus need arises for an engine agnostic approach to use secondary indexes in query pruning.
   
    ### What changes were proposed in this PR?
   1. Added Secondary Index datamap as a coarse grain datamap
   2. Secondary Index datamap prune fires the spark sql query on the identified secondary index table within the particular segment to get the position references for the matching filters of datamap and in turn forms the blocklets. Note: Spark sql query is fired to take the advantage of spark distributed computing to filter and read the secondary index table in the distributed manner. Since the secondary index datamap fires the spark sql, it is prerequisite to enable distributed pruning and the Index Server must be up and running.
   3. Have added a CarbonInputFormat level property to control the use of newly added secondary index datamap or not in query pruning. This property is set only when query is triggered from Presto. So, Secondary index datamap is used only for Presto queries. And queries from spark continue to use the existing approach of plan modification at optimizer/execution phases.
   4. Upon Index Server get splits, if secondary index prune is applicable, prune and get extended blocklets directly on the index server driver instead of using existing DistributedPruneRDD which prunes on index server executors. This is because secondary index datamap pruning essentially fires a spark sql query and it require spark session/context.
   
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-802776615


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3838/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-802779583


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5604/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-803526629


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5607/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-803527532


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3841/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-803623919


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5056/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-803625683


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3304/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-808449555


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3346/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-808449842


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5097/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-808456818


   retest this please


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-808462123


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3347/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-808462627


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5098/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-809073737


   retest this please


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-809122015


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5099/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-809125024


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3348/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-809203807


   retest this please


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-809273056


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3350/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-809275416


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5101/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-811956693


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5112/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-811958406


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3361/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12345