|
by Philip Lambert
There are several scenarios where normal
statistics gathering methods may not be appropriate and require
special attention.
-
Global temporary tables (GTT). These tables
do not hold data beyond a transaction or session, so any routine
stats gathering will gather meaningless information.
-
Intermediate/staging tables. These are heap
tables which hold temporary data, typically truncated prior to
processing. Depending on how a batch job using these tables
cleans up, there may or may not be data in these tables.
-
Stale statistics. The automatic statistic
collection process gathers stats on objects with missing
statistics or has stale statistics. If a monitored table (needs
to be enabled) has been modified more than 10%, then these
statistics are considered stale. There will be a period of time
between becoming stale and having the statistics re-gathered
where they will not reflect the true situation.
-
In-memory PL/SQL tables and pipelined
functions. A popular means of including PL/SQL array data into a
SQL and streaming large amounts of data; however these are not
proper tables so stats cannot be recorded against them.
There are several techniques that can be used
to provide more accurate statistics to the optimizer for the above
scenarios:
Selecting which one to use is very dependent on
what types of tables are being used and how they are being used.
Dynamic Sampling
This permits the optimizer to take a quick look
at the table when statistics are missing. It will sample the data in
the table to come up with better estimates of what it is dealing
with. Oracle will always use dynamic sampling on tables which do not
have statistics present.
There are two ways to enable dynamic sampling:
-
Removing statistics from tables to enable
automatic dynamic sampling on those tables. The
OPTIMIZER_DYNAMIC_SAMPLING parameter can be set at the database
instance level and can also be overridden at the session level
with the ALTER SESSION command to set the required behavior.
-
The DYNAMIC_SAMPLING hint can be added to
specific queries where statistics cannot be trusted or where a
more aggressive sampling is required over the default level.
The purpose of dynamic sampling is to improve
server performance by determining more accurate selectivity and
cardinality estimates to allow the optimizer to produce better
performing plans.
You can set the value of DYNAMIC_SAMPLING to a
value from 0 to 10. The higher the level, the more effort the
compiler puts into dynamic sampling and the more broadly it is
applied.
A common misunderstanding about dynamic
sampling is that it adds an overhead to each query execution, but
the actual overhead is only associated with the hard parse.
Subsequent soft parses would not involve the need for dynamic
sampling.
CARDINALITY Hint
The cardinality hint is used in two general
cases, complex joins and dynamically created tables like global
temporary tables. Probably best to point out that there is
discussion on whether the CARDINALITY hint will be deprecated in
favour of the OPT_ESTIMATE hint, and since neither is documented by
Oracle, they should not be used for production solutions.
This is useful approach for using with PL/SQL
collection variables (in-memory tables contained in a PL/SQL
collection). The database will not dynamically sample these, nor are
they real tables, so no statistics can be stored against them.
This hint may be favoured over dynamic sampling
where a GTT experiences a high degree of concurrency. Dynamic
sampling in this scenario will consider all data in the GTT and not
the specific session. However, the cardinality parameter in the hint
cannot be set via a bind variable so dynamic SQL would be necessary
in order to set a specific cardinality value to accommodate highly
varying row counts. Care would also be needed to ensure that too
many values for the cardinality do not generate too many versions of
the SQL and increase the level of hard parses, so a stepped approach
should be considered, e.g. CARDINALITY(10), CARDINALITY(100), etc.
The CARDINALITY hint is also very useful for
in-memory PL/SQL tables. They are commonly used with the IN clause
with a sub-query that has cast a collection into a table:
SELECT *
FROM emp
WHERE empid IN
(SELECT COLUMN_VALUE
FROM TABLE(CAST(v_my_in_list AS
t_number_array)) t)
Using the CARDINALITY hint in the sub-query
lets the optimizer know what number of rows to expect, but only
seems to be applied if the sub-query is materialized in a refactored
query using a dummy ROWNUM >=0 predicate (order by, union,
intersect, group by, etc will have the same effect).
WITH rfq_my_in_list AS
(
SELECT COLUMN_VALUE empid FROM TABLE(CAST(v_my_in_list AS
t_number_array)) t
)
SELECT *
FROM emp
WHERE empid IN
(SELECT /*+ cardinality(t 5) */ empid FROM rfq_my_in_list t WHERE
ROWNUM >= 0)
DBMS_STATS
Statistics are either gathered against the current volume of data in
the table, or the statistics are set manually based on knowledge of
the volumes of data loaded. Using this approach is probably more
appropriate for Oracle 9i and earlier.
Statistics are manually set using the DBMS_STATS.SET_TABLE_STATS
procedure, would be typically done once, and locked to prevent them
from being recalculated. This is okay where the table will have a
constant volume of data and the statistics always provide accurate
information to the optimizer.
Where data volumes vary considerably between executions, then it
would be useful to re-calculate the statistics using the
DBMS_STATS.GATHER_TABLE_STATS procedure. In this scenario it would
probably better to consider using dynamic sampling as it would
effectively be doing something similar, but automatically. Setting
statistics is a DDL operation so any on the fly calculations will
cause implicit commits, which may have issues with the transaction
and global temporary tables with DELETE ON COMMIT would loose their
data. A way around this is to use AUTONOMOUS_TRANSACTION in a PL/SQL
procedure to prevent committing the current transaction. This is not
the most elegant of approaches to consider and could potential
suffer concurrency issues.
Extensible Optimiser
The Extensible Optimiser uses the Data Cartridge feature of Oracle
and is implemented via an object type with pre-defined methods for
its interface. Custom cardinality calculations can be developed and
implemented into the object type and automatically invoked by the
CBO.
This approach is very useful for In-memory PL/SQL tables and
pipelined functions and seems to be the only method that is
currently fully supported by Oracle.
This is a very involved feature of Oracle so
there would be too much to cover in this article. Conclusion
Many of the points discussed in this article can be pre-empted
during database design and therefore avoid the database performance
crises that can often hit a development project.
|