Teradata being a Datawarehousing database, performance in terms of data distribution and data retrival is one of the basic expectations. Teradata has tried to revisit and enhance its existing performance features rather than introducing something new, in its version Teradata 12.0.
2.1. Optimizer Cost Estimation
Teradata Optimizer uses various internal functions to compute the cost for execution of given query. The cost is returned by the optimizer on the below mentioned factors:-
- Calculated number of blocks in a relation
- Cost of various disk operations
- Number of AMPs used for an operation
- Number of AMPs configured per node
- Cost of sorting a spool file
- Cost of duplicating rows across AMPs
- Hashing costs
The above mentioned parameters are there after used to optimize or tune the SQL.
To facilitate cost optimization, Teradata Database 12.0 provides a framework for cost
estimation within which cost function parameter values are administered, updated, and
deployed in a uniform and convenient manner.- Users can change the cost profile to be used with sessions logged in under a given user profile.
- DBC.Profiles.CostProfileName, if non-null, specifies the cost profile to be used for sessions started by any user assigned to this user profile.
- In Teradata 12.0 we can have COST PROFILE as NULL or cost_profile_name in CREATE or MODIFY PROFILE statement. This leads to support of the following statement in Teradata 12.0 DIAGNOSTIC SET PROFILE prof-name ON FOR SESSION;
- Teradata Database 12.0 supports the following diagnostic request statements:
a) DIAGNOSTIC COSTPRINT:
Extracts Optimizer costing information, including cost profile information, from SystemFE.Opt_Cost_Table and dumps it to the DBS I/O window.
b) DIAGNOSTIC DUMP COSTS:
Extracts TPA-calculated Optimizer cost information and DBS Control Record flag
information from a target system and writes it to the following tables on that system:
SystemFE.Opt_Cost_Table or SystemFE.Opt_DBSCtl_Table
c) DIAGNOSTIC HELP COSTS:
Reports information about the active cost profiles and TLE cost segments.
d) DIAGNOSTIC SET COSTS:
Instructs the Optimizer on the test system to use the Optimizer cost parameters and
relevant DBS Control Record information from target_system_name for its emulations for a period defined by the scope level.
e) DIAGNOSTIC HELP PROFILE:
Reports costs parameter information for the specified scope level.
f) DIAGNOSTIC SET PROFILE:
Instructs the Optimizer on the test system to activate or deactivate a cost profile at the specified
level. The system propagates activated cost profiles to successor levels.
g) DIAGNOSTIC DUMP SAMPLES:
Extracts random AMP samples from a target system and writes them to
SystemFE.Opt_RAS_Table on that system.
h) DIAGNOSTIC HELP SAMPLES:
Returns, for each scope level, the random AMP statistical samples that have been set.
i) DIAGNOSTIC SET SAMPLES:
Instructs the Optimizer on the test system to use the random AMP samples from
target_system_name for its emulations
2.1. Statistics Enhancements:
Collecting statistics has been enhanced to capture more data demographic information so that the Optimizer can generate more accurate plans than it previously could.
2.2.1 Increased Statistics Intervals:
The maximum number of intervals for statistics on an index or column is increased from 100 to 200.
The increase in the number of statistics intervals:
- Improves single table cardinality estimates that are crucial for join planning. Having more intervals gives a more granular view of the demographics.
- Increases the accuracy of skew adjustment because of the higher number of modal frequencies that can be stored in a histogram.
- Does not change the procedure for collecting or dropping statistics, although it affects the statistics collected.
The time and spool space needed to collect statistics can be greater with the larger number of intervals, so an option is available to continue to use 100 intervals. You can decrease the maximum number of intervals using the OCES cost profile. The name of the relevant flag is MaxStatsInterval. The default is set to 200.
2.2.2 Extrapolating Statistics Outside Range:
- The Optimizer extrapolates statistics on rolling columns. A rolling column has a constantnumber of rows per value and a varying number of unique values. Examples of rolling columns are those having a DATE or TIMESTAMP data type.
- For these columns, the demographics of existing data never changes, and only new data can add new distinct values to the column.
- You can control date extrapolation of statistics by setting the ExtrapolateStatistics flag in the OCES cost profile. The default setting for this flag is TRUE.
Teradata Database 12.0 expands null-related demographic information as follows:
- An all-null fields statistic, new in this release, is a count of all rows in the collected row set that have nulls in all of the columns on which statistics are collected. The name of this statistic is NumAllNulls.
- A null fields statistic, retained from earlier releases, is a count of all rows in the collected row set that have one or more nulls in any of the columns on which statistics are collected. The name of this statistic is NumNulls.
2.2.4 Collecting AMP Level Statistics Values
- The Optimizer uses a new and accurate Average AMP-local Rows per Value (AvgAmpRPV) statistic. This statistic adds one column to the output of a HELP STATISTICS request.
- AvgAmpRPV is an average of the average of each AMP in the system over the number of rows per value for a NUSI column set. AvgAmpRPV is useful for detecting and reacting to skewed distribution.
2.3. Enhanced Query Rewrite Capability
Teradata Database 12.0 collects all the previously dispersed query rewrite modules, moving components that had previously been within the Resolver and the Optimizer to a new Query Rewrite subsystem, to the stage in query processing that occurs just after the Resolver phase and just before the Optimizer phase.
The Query Rewrite subsystem takes the ResTree for the query (call it Q), the version that the Resolver produces, and produces a semantically identical ResTree' (where ' indicates the word prime), which it then passes to the Optimizer. The rewritten query (call it Q') has two
properties:
- It is semantically identical to the original query Q.
- It runs faster than Q.
2.4 Replace Poorly Performing Generic Plan with Specific Plan Automatically
As part of the Parameterized Statement Caching Improvements feature, the Parser uses a cost profile parameter, CompareEstimates, to determine if it should reparse a request that would have generated a poor generic performance plan in order to generate a specific plan.
The CompareEstimates default value is 0, which means the Parser will not reparse the plan. The value can be changed in the cost profile.
The value of reparsing and generating a specific plan is this. Reparsing avoids an obviouslyunder-performing generic plan execution even once. This provides a performance benefit for parameterized queries.
2.5 Hash Bucket Expansion
Teradata Database 12.0 supports either 65,536 or 1,048,576 hash buckets for a system. The larger number of buckets primarily benefits systems with thousands of AMPs, but there is no disadvantage to using the larger number of buckets on smaller systems.
On systems with:
- 65,536 hash buckets, the system uses 16 bits of the 32-bit RowHash to index into the hash map.
- 1,048,576 hash buckets, the system uses 20 bits of the 32-bit RowHash as the index.
2.6 Multilevel Partitioned Primary Index
Teradata Database 12.0 supports a Multilevel Partitioned Primary Index (MLPPI) wherever a Partitioned Primary Index (PPI) is supported. This includes:
- Base tables
- Global temporary tables
- Volatile tables
- Noncompressed join indexes
Multilevel partitioning allows each partition to be subpartitioned. Each level must define at least two partitions. The number of levels of partitioning cannot exceed 15. The limit is 65,535 partitions for a combined partitioning expression. The number of levels of partitioning may be further restricted by other limits such as the maximum size of the table header or data dictionary column sizes.
2.7 Remove Restriction for Table Function
Teradata Database 12.0 lifts the requirement that a relation used as input for a table function must also be referenced in the FROM clause of the SELECT statement that invokes the table function.
This specification is still supported, but is no longer required.
Instead, if the table function arguments are fields from a derived table, the derived table can be declared within the WITH clause of the SELECT statement. Otherwise, table function arguments that are columns from a base table or view are allowed without having to reference the base table or view in the FROM clause.
2.8 MRG Performance Improvement
Teradata Database 12.0 reduces export response time and MRG (short for BYNET Merge subsystem) network overhead.
2.9 Scope of Covering Index is broadened
Optimizer enhancements make better use of the covering index by introducing a constraint scan of an index subtable as an access path for aggregate queries. This avoids having to perform a full table scan.
2.10 Compression on Soft and Batch Referential Integrity Column
Teradata Database 12.0 supports data compression on:
- Foreign key (FK)
- Unique Secondary Index (USI) primary key (PK).
1 comments:
Post a Comment