Saturday, 21 April 2012

JOINS USED BY THE OPTIMIZER

JOINS USED BY THE OPTIMIZER


With the CBO, the optimizer generates a set of execution plans, according to possible join orders, join methods, and available access paths. The optimizer then estimates the cost of each plan and chooses the one with the lowest cost. The optimizer estimates costs in the following ways:

The cost of a nested loops operation is based on the cost of reading each selected row of the outer table and each of its matching rows of the inner table into memory. The optimizer estimates these costs using the statistics in the data dictionary. The cost of a sort merge join is based largely on the cost of reading all the sources into memory and sorting them.

A smaller sort area size is likely to increase the cost for a sort merge join because sorting takes more CPU time and I/O in a smaller sort area. Sort area size is specified by the initialization parameter SORT_AREA_SIZE.

A larger multiblock read count is likely to decrease the cost for a sort merge join in relation to a nested loop join. If a large number of sequential blocks can be read from disk in a single I/O, then an index on the inner table for the nested loop join is less likely to improve performance over a full table scan. The multiblock read count is specified by the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT.

With the CBO, the optimizer's choice of join orders can be overridden with the ORDERED hint. If the ORDERED hint specifies a join order that violates the rule for an outer join, then the optimizer ignores the hint and chooses the order. Also, you can override the optimizer's choice of join method with hints.

NESTED LOOP JOINS


Nested loop joins are useful when small subsets of data are being joined and if the join condition is an efficient way of accessing the second table.

It is very important to ensure that the inner table is driven from (dependent on) the outer table. If the inner table's access path is independent of the outer table, then the same rows are retrieved for every iteration of the outer loop, degrading performance considerably. In such cases, hash joins joining the two independent row sources perform better.

A nested loop join involves the following steps:

  • The optimizer determines the driving table and designates it as the outer table.
  • The other table is designated as the inner table.
  • For every row in the outer table, Oracle accesses all the rows in the inner table. The outer loop is for every row in outer table and the inner loop is for every row in the inner table.


The outer loop is the driving row source. It produces a set of rows for driving the join condition. The row source can be a table accessed using an index scan or a full table scan. Also, the rows can be produced from any other operation. For example, the output from a nested loop join can be used as a row source for another nested loop join.


HASH JOINS




Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.


This method is best used when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.


The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:
  • A large amount of data needs to be joined.
  • A large fraction of the table needs to be joined.


However, in a production database with very large tables, it is not always easy to get your database to invoke hash joins without increasing the RAM regions that control hash joins. Oracle places limits of the amount of RAM available for hash joins so that no single session will "hog" all of the RAM. For large tables, hash joins requires lots of RAM.According to the research done by the Burleson Consulting Group , the maximum hash size is limited to 5% of the pga_aggregate_target.


The CBO will only choose a hash join if you have allocated Oracle enough RAM area in which to perform the hash join.


Make sure that the smaller table is the driving table (the first table in the from clause when using the ordered hint). This is because a hash join builds the memory array using the driving table.


Oracle recommends that the hash_area_size for the driving table be set at 1.6 times the sum of bytes for the driving table, and you can use the hash_area.sql script to set the hash_area_size for your query session.




SORT MERGE JOIN




Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform better than sort merge joins. On the other hand, sort merge joins can perform better than hash joins if both of the following conditions exist:
  • The row sources are sorted already.
  • A sort operation does not have to be done.


However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.


Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a non equality) like <, <=, >, or >=. Sort merge joins perform better than nested loop joins for large data sets. You cannot use hash joins unless there is an equality condition.


The join consists of two steps:


Sort join operation:     Both the inputs are sorted on the join key.
Merge join operation:     The sorted lists are merged together.


The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:
  • The join condition between two tables is not an equi-join.
  • OPTIMIZER_MODE is set to RULE.
  • HASH_JOIN_ENABLED is false.
  • Because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.
  • The optimizer thinks that the cost of a hash join is higher, based on the settings of HASH_AREA_SIZE and SORT_AREA_SIZE.


A happy side-effect of the sort-merge join is that the results are returned sorted by the columns in the join predicates; this feature can be used to dispense with an ORDER BY clause and therefore a subsequent sort.




CARTESIAN JOIN




A Cartesian join is used when one or more of the tables does not have any join conditions to any other tables in the statement. The optimizer joins every row from one data source with every row from the other data source, creating the Cartesian product of the two sets.




SEMI JOIN




A semi-join returns rows that match an EXISTS subquery without duplicating rows from the left side of the predicate when multiple rows on the right side satisfy the criteria of the subquery.


ANTI JOIN




An anti-join returns rows from the left side of the predicate for which there are no corresponding rows on the right side of the predicate. That is, it returns rows that fail to match (NOT IN) the sub query on the right side.

No comments:

Post a Comment