A prerequisite of the star transformation is that there be a single-column bitmap index on every join column of the fact table. These join columns include all foreign key columns.
For example, the sales table of the
sh
sample schema has bitmap indexes on the time_id
, channel_id
, cust_id
, prod_id
, and promo_id
columns.
Consider the following star query:
SELECT ch.channel_class, c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount FROM sales s, times t, customers c, channels ch WHERE s.time_id = t.time_id AND s.cust_id = c.cust_id AND s.channel_id = ch.channel_id AND c.cust_state_province = 'CA' AND ch.channel_desc in ('Internet','Catalog') AND t.calendar_quarter_desc IN ('1999-Q1','1999-Q2') GROUP BY ch.channel_class, c.cust_city, t.calendar_quarter_desc;
Oracle processes this query in two phases. In the first phase, Oracle uses the bitmap indexes on the foreign key columns of the fact table to identify and retrieve only the necessary rows from the fact table. That is, Oracle will retrieve the result set from the fact table using essentially the following query:
SELECT ... FROM sales WHERE time_id IN (SELECT time_id FROM times WHERE calendar_quarter_desc IN('1999-Q1','1999-Q2')) AND cust_id IN (SELECT cust_id FROM customers WHERE cust_state_province='CA') AND channel_id IN (SELECT channel_id FROM channels WHERE channel_desc IN('Internet','Catalog'));
This is the transformation step of the algorithm, because the original star query has been transformed into this subquery representation. This method of accessing the fact table leverages the strengths of Oracle's bitmap indexes. Intuitively, bitmap indexes provide a set-based processing scheme within a relational database. Oracle has implemented very fast methods for doing set operations such as
AND
(an intersection in standard set-based terminology), OR
(a set-based union), MINUS
, and COUNT
.
In this star query, a bitmap index on
time_id
is used to identify the set of all rows in the fact table corresponding to sales
in 1999-Q1
. This set is represented as a bitmap (a string of 1's and 0's that indicates which rows of the fact table are members of the set).
A similar bitmap is retrieved for the fact table rows corresponding to the sale from
1999-Q2
. The bitmap OR
operation is used to combine this set of Q1
sales with the set of Q2
sales.
Additional set operations will be done for the
customer
dimension and the product
dimension. At this point in the star query processing, there are three bitmaps. Each bitmap corresponds to a separate dimension table, and each bitmap represents the set of rows of the fact table that satisfy that individual dimension's constraints.
These three bitmaps are combined into a single bitmap using the bitmap
AND
operation. This final bitmap represents the set of rows in the fact table that satisfy all of the constraints on the dimension table. This is the result set, the exact set of rows from the fact table needed to evaluate the query. Note that none of the actual data in the fact table has been accessed. All of these operations rely solely on the bitmap indexes and the dimension tables. Because of the bitmap indexes' compressed data representations, the bitmap set-based operations are extremely efficient.
Once the result set is identified, the bitmap is used to access the actual data from the sales table. Only those rows that are required for the end user's query are retrieved from the fact table. At this point, Oracle has effectively joined all of the dimension tables to the fact table using bitmap indexes. This technique provides excellent performance because Oracle is joining all of the dimension tables to the fact table with one logical join operation, rather than joining each dimension table to the fact table independently.
The second phase of this query is to join these rows from the fact table (the result set) to the dimension tables. Oracle will use the most efficient method for accessing and joining the dimension tables. Many dimension are very small, and table scans are typically the most efficient access method for these dimension tables. For large dimension tables, table scans may not be the most efficient access method. In the previous example, a bitmap index on
product.department
can be used to quickly identify all of those products in the grocery department. Oracle's cost-based optimizer automatically determines which access method is most appropriate for a given dimension table, based upon the cost-based optimizer's knowledge about the sizes and data distributions of each dimension table.
The specific join method (as well as indexing method) for each dimension table will likewise be intelligently determined by the cost-based optimizer. A hash join is often the most efficient algorithm for joining the dimension tables. The final answer is returned to the user once all of the dimension tables have been joined. The query technique of retrieving only the matching rows from one table and then joining to another table is commonly known as a semi-join.
No comments:
Post a Comment