Partitioning on Hive: know the process
Partitioning and bucketing in hiveused to slice the records horizontally over the complete range or on a smaller range of values using one or more column. Partition concept is well known in RDBMS as well, so in case you understand the database then you definitely have to be aware about the idea and if now not then also its no longer an problem as you will be nicely aware about the concept after studying this article e.g. in case you are having hundred records in a table with student data in an college and also you want to divide the whole document into Male and woman student so right here it’s miles not anything however partitioning and GENDER is my column that is used to cut up the facts. Once more, as I informed, there can be more than one column used to cut up the records.
In real word state of affairs, in case you want to investigate the log files of user interest on internet, it’d be extremely good if we can hold the statistics as in step with date and geographies also at Partitioning and bucketing in hive. There are two types of partitioning in HIVE:
- Static Partitioning
- Dynamic partitioning
The Table DDL statement will be same in cases of each the partitioning.
I have created a desk T_USER_LOG with DT and COUNTRY column as my partitioning column. I’ve use hive script mode where “HivePartition.hql’ is my script report. You may use hive shell command as nicely or whichever is feasible for you. As highlighted within the photo, the partition column seems in desk schema like regular table column.
Static Partitioning: In Static Partition through Hive ORC, we know the partition column earlier than itself. to this point so right, now while we load information there it makes the distinction.
LOAD DATA LOCAL INPATH [path_name] OVERWRITE INTO TABLE [table_name] PARTITION(partition_column=’price’….).
OUTPUT: due to the fact that our desk T_USER_LOG is controlled desk so the information is loaded in hive warehouse path i.e /consumer/hive/warehouse/t_user_log.
Here you may test all different partition as properly; it will have the file UserLog.txt. Overall two level of partitioning is there in our example, one as DT and another as COUNTRY, and then the very last data might be stored inner. All partitions in hive is there as directories. Loading in hive is immediately system and it may not cause a Hive Map /reduce task. It is why our record is saved as UserLog.txt in preference to 00000_o report. Please comply with the article as I’m able to show in dynamic partition where we are able to LOAD desk using another table wherein Map/lessen task is caused.
DYNAMIC PARTITIONING: allow us to see now the weight script of Dynamic Partitioning. We are able to create new desk T_USER_LOG_DYN for dynamic partition and also as we advised earlier that we can load this table the use of a new table, allows create another table T_USER_LOG_SRC.