Database Design: Missing or Bad Statistics

A common cause of poor SQL performance in an application is missing/stale CBO statistics on a table. The typical aim is to have statistics gathered on all tables belonging to an application; however there are situations where statistics are not possible or are inappropriate. This article covers the scenarios where this occurs and proposes methods to ensure that they are applied effectively.

There are several scenarios where normal statistics gathering methods may not be appropriate and require special attention.

  • Global temporary tables (GTT). These tables do not hold data beyond a transaction or session, so any routine stats gathering will gather meaningless information.
  • Intermediate/staging tables. These are heap tables which hold temporary data, typically truncated prior to processing. Depending on how a batch job using these tables cleans up, there may or may not be data in these tables.
  • Stale statistics. The automatic statistic collection process gathers stats on objects with missing statistics or has stale statistics. If a monitored table (needs to be enabled) has been modified more than 10%, then these statistics are considered stale. There will be a period of time between becoming stale and having the statistics re-gathered where they will not reflect the true situation.
  • In-memory PL/SQL tables and pipelined functions. A popular means of including PL/SQL array data into a SQL and streaming large amounts of data; however these are not proper tables so stats cannot be recorded against them.

There are several techniques that can be used to provide more accurate statistics to the optimizer for the above scenarios:

  • Dynamic sampling
  • The CARDINALITY hint (9i+) or OPT_ESTIMATE hint (10g+) [undocumented]
  • Using DBMS_STATS
  • Extensible Optimiser

Selecting which one to use is very dependent on what types of tables are being used and how they are being used.

Dynamic Sampling

This permits the optimizer to take a quick look at the table when statistics are missing. It will sample the data in the table to come up with better estimates of what it is dealing with. Oracle will always use dynamic sampling on tables which do not have statistics present.

There are two ways to enable dynamic sampling:

  • Removing statistics from tables to enable automatic dynamic sampling on those tables. The OPTIMIZER_DYNAMIC_SAMPLING parameter can be set at the database instance level and can also be overridden at the session level with the ALTER SESSION command to set the required behavior.
  • The DYNAMIC_SAMPLING hint can be added to specific queries where statistics cannot be trusted or where a more aggressive sampling is required over the default level.

The purpose of dynamic sampling is to improve server performance by determining more accurate selectivity and cardinality estimates to allow the optimizer to produce better performing plans.

You can set the value of DYNAMIC_SAMPLING to a value from 0 to 10. The higher the level, the more effort the compiler puts into dynamic sampling and the more broadly it is applied.

A common misunderstanding about dynamic sampling is that it adds an overhead to each query execution, but the actual overhead is only associated with the hard parse. Subsequent soft parses would not involve the need for dynamic sampling.

CARDINALITY Hint

The cardinality hint is used in two general cases, complex joins and dynamically created tables like global temporary tables. Probably best to point out that there is discussion on whether the CARDINALITY hint will be deprecated in favour of the OPT_ESTIMATE hint, and since neither is documented by Oracle, they should not be used for production solutions.

This is useful approach for using with PL/SQL collection variables (in-memory tables contained in a PL/SQL collection). The database will not dynamically sample these, nor are they real tables, so no statistics can be stored against them.

This hint may be favoured over dynamic sampling where a GTT experiences a high degree of concurrency. Dynamic sampling in this scenario will consider all data in the GTT and not the specific session. However, the cardinality parameter in the hint cannot be set via a bind variable so dynamic SQL would be necessary in order to set a specific cardinality value to accommodate highly varying row counts. Care would also be needed to ensure that too many values for the cardinality do not generate too many versions of the SQL and increase the level of hard parses, so a stepped approach should be considered, e.g. CARDINALITY(10), CARDINALITY(100), etc.

The CARDINALITY hint is also very useful for in-memory PL/SQL tables. They are commonly used with the IN clause with a sub-query that has cast a collection into a table:

SELECT *
FROM emp
WHERE empid IN
(SELECT COLUMN_VALUE
FROM TABLE(CAST(v_my_in_list AS t_number_array)) t)

Using the CARDINALITY hint in the sub-query lets the optimizer know what number of rows to expect, but only seems to be applied if the sub-query is materialized in a refactored query using a dummy ROWNUM >=0 predicate (order by, union, intersect, group by, etc will have the same effect).

WITH rfq_my_in_list AS
(
SELECT COLUMN_VALUE empid FROM TABLE(CAST(v_my_in_list AS t_number_array)) t
)
SELECT *
FROM emp
WHERE empid IN
(SELECT /*+ cardinality(t 5) */ empid FROM rfq_my_in_list t WHERE ROWNUM >= 0)

DBMS_STATS

Statistics are either gathered against the current volume of data in the table, or the statistics are set manually based on knowledge of the volumes of data loaded. Using this approach is probably more appropriate for Oracle 9i and earlier.

Statistics are manually set using the DBMS_STATS.SET_TABLE_STATS procedure, would be typically done once, and locked to prevent them from being recalculated. This is okay where the table will have a constant volume of data and the statistics always provide accurate information to the optimizer.

Where data volumes vary considerably between executions, then it would be useful to re-calculate the statistics using the DBMS_STATS.GATHER_TABLE_STATS procedure. In this scenario it would probably better to consider using dynamic sampling as it would effectively be doing something similar, but automatically. Setting statistics is a DDL operation so any on the fly calculations will cause implicit commits, which may have issues with the transaction and global temporary tables with DELETE ON COMMIT would loose their data. A way around this is to use AUTONOMOUS_TRANSACTION in a PL/SQL procedure to prevent committing the current transaction. This is not the most elegant of approaches to consider and could potential suffer concurrency issues.

Extensible Optimiser

The Extensible Optimiser uses the Data Cartridge feature of Oracle and is implemented via an object type with pre-defined methods for its interface. Custom cardinality calculations can be developed and implemented into the object type and automatically invoked by the CBO.

This approach is very useful for In-memory PL/SQL tables and pipelined functions and seems to be the only method that is currently fully supported by Oracle.

This is a very involved feature of Oracle so there would be too much to cover in this article.

Conclusion

Many of the points discussed in this article can be pre-empted during database design and therefore avoid the database performance crises that can often hit a development project.

0

Leave a Reply