|Subject:||Limitations of the Oracle Cost Based Optimizer|
|Last Revision Date:||12-OCT-2006||Status:||PUBLISHED|
This article outlines some of the limitations that may be encountered when using the Cost Based Optimizer.
Cost Based Optimizer Limitations
The Cost Based Optimizer (CBO) uses a complex and comprehensive model to choose the plan with the lowest cost overall. In most cases the model picks the best access methods for accessing the data in the most efficient manner. However, even with computed statistics and column histograms it is possible for the Cost Based Optimizer to choose a sub-optimal plan. There are limitations to the Cost model that can affect queries in some circumstances. Some of these are listed and explained below:
- Potential for incorrect estimation of intermediate result set cardinalityCardinality is the CBO estimate of the number of rows produced by a row source or combination of row sources. In some cases, the cardinality of result sets can be miscalculated. This is most common with complex predicates where the statistics do not accurately reflect the data or where predicate values are correlated. The following is an illustration of a statement featuring correlated predicates:select ename from emp where sal >= 25000 and job = ‘PRESIDENT’;In this example there is a hidden correlation between the data values in the sal and job columns. Only the president earns more than $25,000. There are no employees with a “sal >= 25000″ who are not ‘PRESIDENT’. The optimizer has no way of determining that such a correlation exists and would treat the 2 columns as though their values were independent of each other. This would result in an underestimation of the selectivity of the predicates and thus an inaccurate cardinality estimate.
- Assumption that all statements run standaloneSince the CBO assumes that all statements run standalone, it may underestimate the volume of data that has been cached as a result of other statements running previously or concurrently. This can result in significant over estimation of the cost of index accesses which can read from cached data rather than having to retrieve the data from disk. The parameters <Parameter:OPTIMIZER_INDEX_CACHING> and <Parameter:OPTIMIZER_INDEX_COST_ADJ> can be used to modify these costs to reflect the characteristics of the system in question.
- Histogram bucket limitationsHistograms are limited to 254 buckets so if there are more than 254 distinct values and there is no single value that dominates the column’s dataset then histograms may not provide helpful statistics. With large numbers of distinct values, histogram usage can be further compromised because differences in non popular values cannot be recorded. The choice of 254 buckets per column histogram was a balance between accuracy of the statistics and the speed of histogram collection/amount of space required to store the information.
- Limitations of Histogram on Character ColumnsHistograms only store the first 32 characters of a character string (5 characters pre 8.1.6 See Bug 598799). If histograms are gathered on character columns that have data that is longer than 31 characters and the first 31 characters are identical then column histograms may not reflect the cardinality of these columns correctly as these values will all be treated as if they are identical.
- Bind VariablesThe use of bind variables is often recommended to avoid issues associated with the non-sharing of SQL. However, since the optimizer does not know the value of the bind variable when the plan is determined, a sub-optimal plan can be produced. Bind variable peeking was introduced in Oracle9i to alleviate this issue, but this bases the plan on the first value that is bound to the variable. If this is not representative of all the values then the performance of the plans may turn out ot be variable. See: Note 70075.1 Use of bind variables in queries
- Subquery Unnesting and View MergingOracle exhaustively attempts to transform statements containing subqueries and views in to simpler statements. The goal of this activity is to make more access paths accessible and find the optimal plan. However, Subquery Unnesting and View Merging is a heuristic process. Subqueries are unnested and views are merged based upon a set of rules. No costs are generated and compared back to the unchanged statement. This could mean that the transformed statement does not perform as well as the untransformed one.N.B. Oracle 10G introduces Costed Subquery Unnesting and View Merging which should go some way to alleviating the effects of this limitation
- Join PermutationsThe CBO evaluates each permutation of tables up to a predefined limit (<Parameter:optimizer_max_permutations>). As the number of tables involved in a query increases, so the total number of permutations increases. Soon the number that can be evaluated in a realistic timeframe is a minute proportion of the total. There are a large number of adjustments that occur within the optimization process to attempt to minimise the possibility of a good candidate being overlooked, but it is possible that the optimizer may not even consider the most optimal join order and method in large queries. The following article has more detail on optimizer permutations: Note 73489.1 Affect of Number of Tables on Join Order Permutations
Potential workarounds for these issues
Workarounds for these issues typically include hinting the correct plan or using stored outlines. Enhancements to alleviate the affects of some of these issues are expected in future versions.