BLOOM FILTER AND ITS POSITIVE IMPACT ON HASH JOINS
HASH JOIN :
Hash Joins is one of the method of joining two tables based on the join condition mentioned in the query.
It is important to mention the smaller table in leading position in the sql query when joining two tables. The optimizer always picks the leading table/first table as per the order mentioned in the query.If the execution plan does not pick the smaller table as the leading table then there will be more load/activity at the joining step then the plan which might impact the overall elapsed time of the query. We can explicitly ask the optimizer to pick a particular table as leading table using a oracle hint Ex: /*+ leading(table name)*/
How Hash Join works?
1) First Optimizer scan the smaller table and creates a hash table by applying hash function on the column to be joined.
2) Now the optimizer scans the second table applies a hash function(applies on the column to be joined) to get the hash value for each record in the second table.
3) Now the Hash value of each record of the second table is compared with Hash table of the first table per matching records.
4) Now all the records are matched will be joined and sent to result set.
BLOOM FILTER;
Bloom Filter is one of the filter used in the oracle query execution for the Hash joins to reduce the number of records sent from the trailing table for Hash joining.
Normally without Bloom Filter we calculate Hash value for each record in the second table and send it for comparison with the leading table which in turn increases the user I/O/actual rows accessed from the trailing table.
Now with Bloom Filter, we gather extra information called Bit Vector of the joined column from the both tables.
How Bloom Filter works?
1) Optimizer scans the smaller table and creates a Hash table and collects the Bit Vector of the column to be joined.
2) Now Optimizer scans the second table, generates Hash value for each record, additionally Bit Vector information is also gathered for each column of the record.
3) During this meantime, the Hash table and Bit Vector information of the leading table will be sent for comparison.
4) With the availability of Bit Vector information, its value of leading table compared with the Bit Vector value of the trailing table and the records from the both tables whose Bit Vector values are same are sent further for Hash joining.
5) With Bit Vector comparison the records which used to get rejected earlier at the Hash joining condition are now rejected while scanning
the second table itself. As a result, the count of records which are sent for Hash joining will drastically decreases which in turn saves the time required to get the records of the table.
We can enable and disable the bloom filter by using oracle hints px_join_filter(table) and no_px_join_filter
select /*+ parallel(8) */ count( * ) from FACT, DIMENSION where dimension.col1=1 and dimension.col2=fact.col2;
EXECUTION PLAN WITH BLOOM FILTER:
EXECUTION PLAN WITHOUT BLOOM FILTER:
From above two execution plans, we can see that without Bloom Filter, Optimizer is accessing 1000K records from the fact table and is been sent for Hash joining. But with Bloom filter, only 102 records are being accessed and sent for Hash joining.
Remember in both the cases, the final result set has only 100 records. This is how bloom filter reduces the user i/o of trailing table by rejecting the records based on BIT VECTOR value before sending it for hash value comparision.
Nice bro..
ReplyDelete