这篇文章主要介绍“什么是Clustering Factor”,在日常操作中,相信很多人在什么是Clustering Factor问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答”什么是Clustering Factor”的疑惑有所帮助!接下来,请跟着小编一起来学习吧!
Clustering Factor(CF)聚簇因子,意味着对索引的表的群集度,用来比较索引的顺序程度与表的杂乱排序程度的一个度量。
集群因子用于判断索引回表需要消耗的物理 I/O 次数。
CF是在假设内存大小只能载入一个块,这是伴随索引扫描所需要的表扫描次数的计算值。
更准确地说,他表示沿着索引的叶块,在ROWID值上代表块编号的第1~15位的值与之前ROWID作出比较后被更替的次数。
如:
有五个块组成的索引和由五个块组成的表,而一个块里有4行记录,那么总行数就是20行。
顺序的扫描索引的同时读取相对应的表,这时可能有两个极端情况:
1、CF最低时
如果一个索引块所包含的ROWID都被一个表块所包含,则通过索引扫描表时,只扫描索引5次和表5次,就可以得到想要的数据。
此时CF=5(表块的扫描次数),CF的最小值与表的块数相同
2、CF最高时
如果一个索引块所包含的ROWID分别被不同的表块所包含,则通过索引扫描表时,
必须扫描20次[4(索引块数)+4(索引块数)+4(索引块数)+4(索引块数)+4(索引块数) = 20次]才能获得想要的数据。
这时CF=20(表块的扫描次数),CF最大值与表的行数相同
聚簇因子大致的计算方法顺序如下:
1、进行一次索引全扫描
2、检查索引块中的rowid信息。比较与前一个rowid是否指向同一个数据块。若不同,则聚簇因子加1.
3、当完成整个的索引扫面后,即得到该索引的聚簇因子的数值。
计算过程
SELECT * FROM (SELECT ID ,ROWID FROM TMP_CF ORDER BY ID) WHERE ROWNUM <= 5; ID ROWID ---------- ------------------ 2 AAAVqxAAEAAAAfDACL 3 AAAVqxAAEAAAAfDAAf 3 AAAVqxAAEAAAAfDAJR 4 AAAVqxAAEAAAAfDAGZ 4 AAAVqxAAEAAAAfDAKP
首先我们比较 2、3 对应的 ROWID 是否在同一个数据块,如果在同一个数据块,Clustering Factor +0;
如果不在同一个数据块,那么 Clustering Factor 值加 1。
然后我们比较 3、4 对应的 ROWID 是否在同一个数据块,如果在同一个数据块,Clustering Factor 值不变;
如果不在同一个数据块,那么 Clustering Factor 值加 1。
接下来我们比较4、5对应的ROWID是否在同一个数据块,如果在同一个数据块,Clustering Factor +0;
如果不在同一个数据块,那么 Clustering Factor 值加 1。
像上面步骤一样,一直这样有序地比较下去,直到比较完索引中最后一个键值。
表中的数据块数 <= 聚簇因子的取值范围 <= 表中的记录数
如果集群因子与块数接近,说明表的数据基本上是有序的,而且其顺序基本与索引顺序一样。
这样在进行索引范围或者索引全扫描的时候,回表只需要读取少量的数据块就能完成。
如果集群因子与表记录数接近,说明表的数据和索引顺序差异很大,在进行索引范围扫描或者索引全扫描的时候,回表会读取更多的数据块。
集群因子只会影响索引范围扫描(INDEX RANGE SCAN)以及索引全扫描(INDEX FULL SCAN),因为只有这两种索引扫描方式会有大量数据回表。
集群因子不会影响索引唯一扫描(INDEX UNIQUE SCAN),因为索引唯一扫描只返回一条数据。
集群因子更不会影响索引快速全扫描(INDEX FAST FULL SCAN),因为索引快速全扫描不回表。
集群因子影响的是索引回表的物理 I/O 次数。
我们假设索引范围扫描返回了 1 000 行数据,如果 buffer cache 中没有缓存表的数据块,
假设这 1000行数据都在同一个数据块中,那么回表需要耗费的物理 I/O 就只需要一个;
假设这 1000 行数据都在不同的数据块中,那么回表就需要耗费 1 000 个物理 I/O。
因此,集群因子影响索引回表的物理 I/O 次数。
假设I/O不使用内存,CF越高读取表块的次数就越多,因此物理I/O也会增加。
即,CF越高,通过ROWID读取表块的次数就会越多,db file sequential read也会相应的增加。
通过高速缓冲区读取过一次的块因为不在发生物理I/O,因此CF高未必一定会降低SQL语句的性能。
但是大量扫描CF值高的索引,则需要读取的表块增加,因此可能给性能带来严重问题。
比如当ORACLE计算索引范围扫描(IRS)及其回表时的成本时,
使用如下的公式:
IRS COST=I/O COST + CPU COST
其中 I/O COST = INDEX ACCESS I/O COST + TABLE ACCESS I/O COST
其中:
INDEX ACCESS I/O COST = BLEVEL+CEIL(#LEAF_BLOCKS*IX_SEL)
TABLE ACCESS I/O COST = CEIL(CLUSTERING_FACTOR*IX_SEL_WITH_FILTERS)
索引的聚簇因子越大
--》从该索引回表的成本越大
--》使用该索引的综合成本越大
--》使用该索引的可能越小
对索引收集统计信息,则索引的聚簇因子会被记录在 DBA_INDEXES的CLUSTING_FACTOR列。
如果CF与表的块数相当就是“好现象”;如果CF和表的行数相当就不是“好现象”。
如果SQL语句的性能问题被判断位由于聚簇因子导致,如何进行优化?
1、将索引扫描换成全表扫描
2、使用其他索引
3、按照索引的排列顺序重新创建表(最后的选择)
create table new_table as select ... from old_table order by indexd_column
4、某些情况,设置table_cached_blocks参数
5、设置set_index_stats的聚簇因子值clstfct
注意: 当索引范围扫描,索引全扫描不回表或者返回数据量很少的时候,不管集群因子多大,对 SQL 查询性能几乎没有任何影响。 在进行 SQL 优化的时候,往往会建立合适的组合索引消除回表,或者建立组合索引尽量减少回表次数。
1、聚簇因子高对执行计划的影响
2、聚簇因子低对执行计划的影响
3、调整聚簇因子
3.1、根据索引列重新创建表
3.2、设置table_cached_blocks参数
3.3、设置set_index_stats的聚簇因子值clstfct
/* 创建测试数据 */ /* 为了制造出聚簇因子较高的索引,对生成的ID数值取随机数1至1000。*/ SQL> create table tmp_cf as select ceil(dbms_random.value*1000) id,'a' name from dual connect by rownum<=1000; Table created. /* 在ID列上创建索引 */ SQL> create index ind_id on tmp_cf(id); Index created. /* 收集统计信息 */ SQL> exec dbms_stats.gather_table_stats('demo','tmp_cf'); PL/SQL procedure successfully completed. /* 开启autotrace查看执行计划 */ SQL> set autot traceonly /* 执行查询,条件为ID<=6 */ SQL> select * from tmp_cf where id<=6; Execution Plan ---------------------------------------------------------- Plan hash value: 1290879752 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6 | 36 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF | 6 | 36 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=6) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 676 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 6 rows processed
/* 从执行计划中可见,在只返回6行的情况下,估算出的走索引并回表的成本仍然大于全表扫描。 */
/* 尝试改条件为 id<=5 ,id<=4,id<=3 进一步减少返回的行数。 */
SQL> select * from tmp_cf where id<=5; Execution Plan ---------------------------------------------------------- Plan hash value: 1290879752 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 30 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF | 5 | 30 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=5) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 668 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 5 rows processed SQL> select * from tmp_cf where id<=4; Execution Plan ---------------------------------------------------------- Plan hash value: 1290879752 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4 | 24 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF | 4 | 24 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=4) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 668 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 5 rows processed SQL> select * from tmp_cf where id<=3; Execution Plan ---------------------------------------------------------- Plan hash value: 1290879752 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 18 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF | 3 | 18 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=3) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 652 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed SQL> select * from tmp_cf where id<=2; Execution Plan ---------------------------------------------------------- Plan hash value: 2401189212 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 12 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF | 2 | 12 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID | 2 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=2) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 592 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
/* 可见,当返回的行数为1行时,才选择了用索引并回表的访问路径。*/
/* 创建测试数据 */ /* 创建测试表,但这次ID列的值为顺序产生。 */ SQL> create table tmp_cf_2 as select rownum id,rpad(rownum,50,'a') col1 from dual connect by rownum<=1000; Table created. /* 在ID列上创建索引 */ SQL> create index ind_id_2 on tmp_cf_2(id); Index created. /* 执行查询,条件为ID<=6 */ SQL> select * from tmp_cf_2 where id<=6; Execution Plan ---------------------------------------------------------- Plan hash value: 1887041669 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6 | 390 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF_2 | 6 | 390 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID_2 | 6 | | 2 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=6) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 10 recursive calls 0 db block gets 23 consistent gets 1 physical reads 0 redo size 986 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 6 rows processed
/* 从上面可见,此时已经采用了先索引再回表的访问路径,说明此时其计算出来的成本,是小于全表扫描的。*/
/* 继续执行查询,将条件依次为ID<=200,ID<=222,ID<=223,逐步增加返回的行数 */
SQL> select * from tmp_cf_2 where id<=200; Execution Plan ---------------------------------------------------------- Plan hash value: 1887041669 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 200 | 13000 | 5 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF_2 | 200 | 13000 | 5 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID_2 | 200 | | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=200) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 7 recursive calls 0 db block gets 48 consistent gets 0 physical reads 0 redo size 14949 bytes sent via SQL*Net to client 667 bytes received via SQL*Net from client 15 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 200 rows processed SQL> select * from tmp_cf_2 where id<=222; Execution Plan ---------------------------------------------------------- Plan hash value: 1887041669 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 222 | 14430 | 5 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF_2 | 222 | 14430 | 5 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID_2 | 222 | | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=222) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 7 recursive calls 0 db block gets 50 consistent gets 0 physical reads 0 redo size 16455 bytes sent via SQL*Net to client 678 bytes received via SQL*Net from client 16 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 222 rows processed SQL> select * from tmp_cf_2 where id<=223; Execution Plan ---------------------------------------------------------- Plan hash value: 2467544485 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 223 | 14495 | 5 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF_2 | 223 | 14495 | 5 (0)| 00:00:01 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=223) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 7 recursive calls 0 db block gets 43 consistent gets 0 physical reads 0 redo size 15679 bytes sent via SQL*Net to client 678 bytes received via SQL*Net from client 16 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 223 rows processed
/* 从上可见,当返回223行时,才开始采用全表扫描的方式。*/
/* 查看一下两个索引的聚簇因子 */
SQL> select index_name,clustering_factor from user_indexes where index_name in ('IND_ID','IND_ID_2'); INDEX_NAME CLUSTERING_FACTOR ------------------------------ ----------------- IND_ID 443 IND_ID_2 9
当索引的聚簇因子较高时,会导致该索引回表的成本成大,使得该索引的综合成本越大,因此使用该索引的可能性越低。
但是,如果在执行计划出现错误,使用了该索引,并且该索引的聚簇因子很大,那么通过索引回表的次数就会很大,同时会造成大量的db file sequential read等待事件。
/* 根据tmp_cf的表索引列排完序在创建新表tmp_cf_3 */
SQL> create table tmp_cf_3 as select * from tmp_cf order by id; Table created. /* 创建索引 */ SQL> create index ind_id_3 on tmp_cf_3(id); Index created. /* 执行查询 id<=6 */ SQL> select * from tmp_cf_3 where id <= 6; 6 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2546856486 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6 | 96 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF_3 | 6 | 96 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID_3 | 6 | | 2 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=6) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 10 recursive calls 0 db block gets 15 consistent gets 1 physical reads 0 redo size 692 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 6 rows processed
/* 从上面可见,此时已经采用了先索引再回表的访问路径,说明此时其计算出来的成本,是小于全表扫描的。*/
再次查看聚簇因子 SQL> select index_name,clustering_factor from user_indexes where index_name in ('IND_ID','IND_ID_2','IND_ID_3'); INDEX_NAME CLUSTERING_FACTOR ------------------------------ ----------------- IND_ID 443 IND_ID_2 9 IND_ID_3 2
根据索引列排完序创建的表,然后在原索引列创建索引,此时的聚簇因子很小。
在设置该参数前,有一种方式可以在创建索引前先计算出该索引的聚簇因子。
/* 创建测试数据 */ SQL> create table t1 as select * from dba_objects; Table created. 计算聚簇因子(需要在object_name列创建索引) SELECT SUM(CASE WHEN BLOCK#1 = BLOCK#2 AND FILE#1 = FILE#2 THEN 0 ELSE 1 END) CLUSTERING_FACTOR FROM (SELECT DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) FILE#1 ,LEAD(DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID), 1, NULL) OVER(ORDER BY OBJECT_NAME) FILE#2 ,DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) BLOCK#1 ,LEAD(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID), 1, NULL) OVER(ORDER BY OBJECT_NAME) BLOCK#2 FROM T1 WHERE OBJECT_NAME IS NOT NULL); 或者 /*更推荐使用下面*/ SELECT SUM(COUNT_CHG) FROM (SELECT CASE WHEN SUBSTRB(ROWID, 1, 15) <> LAG(SUBSTRB(ROWID, 1, 15), 1, '000000000') OVER(ORDER BY OBJECT_NAME, ROWID) THEN 1 ELSE 0 END COUNT_CHG FROM T1 WHERE OBJECT_NAME IS NOT NULL); SUM(COUNT_CHG) -------------- 44076
/* 实际创建索引并且查看聚簇因子 */
SQL> create index t1_i1 on t1(object_name); Index created. SQL> select index_name,clustering_factor from user_indexes where table_name = 'T1' and index_name = 'T1_I1'; INDEX_NAME CLUSTERING_FACTOR ------------------------------ ----------------- T1_I1 44076
/* 可以看到计算的聚簇因子的值和实际创建完索引的聚簇因子的值是一样的 */
1、该参数自12C引入(11g也有效)。
2、告诉索引统计信息收集工具,假设索引所在的基表上有多少个数据块已经在内存中。
3、取值范围为1至255,默认值为1。
4、对小表上索引的聚簇因子影响大,而对于大表上索引的聚簇因子影响小。
/* 接此前tmp_cf表的实验 */ /* 查看ID列上索引的聚簇因子 */ SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_ID 1000 443 查看当前的TABLE_CACHED_BLOCKS的值 SQL> select dbms_stats.get_prefs('table_cached_blocks','DEMO','TMP_CF') from dual; DBMS_STATS.GET_PREFS('TABLE_CACHED_BLOCKS','DEMO','TMP_CF') -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 修改为16 SQL> exec dbms_stats.set_table_prefs('DEMO','TMP_CF','TABLE_CACHED_BLOCKS',16); PL/SQL procedure successfully completed. SQL> select dbms_stats.get_prefs('table_cached_blocks','DEMO','TMP_CF') from dual; DBMS_STATS.GET_PREFS('TABLE_CACHED_BLOCKS','DEMO','TMP_CF') -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 16 查看聚簇因子 SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_ID 1000 443 收集索引的统计信息 SQL> exec dbms_stats.gather_index_stats('DEMO','IND_ID'); PL/SQL procedure successfully completed. SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_ID 1000 2
/*此时,再执行查询,条件为ID<=6 看此时的表现*/
SQL> set autot traceonly SQL> select * from tmp_cf where id<=6; Execution Plan ---------------------------------------------------------- Plan hash value: 3869640664 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6 | 330 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TMP_CF | 6 | 330 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID | 6 | | 2 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=6) Statistics ---------------------------------------------------------- 0 recursive calls 3 db block gets 18 consistent gets 0 physical reads 0 redo size 876 bytes sent via SQL*Net to client 607 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 4 rows processed
/* 从执行计划中可见,原来走全表扫描,现在变成走索引范围扫描了。 */
/* 尝试改条件为 id<=100 ,id<=200,id<=300 逐步增大返回的行数。 看看何时不走索引了*/
SQL> select * from tmp_cf where id<=100; 110 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2401189212 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 600 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF | 100 | 600 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID | 100 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=100) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 59 consistent gets 0 physical reads 0 redo size 3034 bytes sent via SQL*Net to client 601 bytes received via SQL*Net from client 9 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 110 rows processed SQL> select * from tmp_cf where id<=200; 206 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2401189212 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 200 | 1200 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF | 200 | 1200 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID | 200 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=200) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 118 consistent gets 0 physical reads 0 redo size 5205 bytes sent via SQL*Net to client 667 bytes received via SQL*Net from client 15 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 206 rows processed SQL> select * from tmp_cf where id<=300; 286 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3869640664 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 300 | 16500 | 5 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TMP_CF | 300 | 16500 | 5 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID | 300 | | 2 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=300) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 271 consistent gets 0 physical reads 0 redo size 21347 bytes sent via SQL*Net to client 816 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 286 rows processed SQL> select * from tmp_cf where id<=332; 336 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2401189212 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 333 | 1998 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF | 333 | 1998 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID | 333 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=332) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 195 consistent gets 0 physical reads 0 redo size 8307 bytes sent via SQL*Net to client 766 bytes received via SQL*Net from client 24 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 336 rows processed SQL> select * from tmp_cf where id<=333; 339 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1290879752 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 334 | 2004 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF | 334 | 2004 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=333) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 26 consistent gets 0 physical reads 0 redo size 7592 bytes sent via SQL*Net to client 766 bytes received via SQL*Net from client 24 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 339 rows processed
/* 如上实验,我们可以看到,TABLE_CACHED_BLOCKS对聚簇因子的大小是有影响的。而聚簇因子变小后,CBO会更倾向于使用该索引,而不是全表扫描。*/
下面以设置TABLE_CACHED_BLOCKS为3来描述计算过程:
准备一个存储三条记录的数组,每条记录有两列,一列存储块ID,一列存储顺序号。
按顺序遍历索引,并依此顺序为每个索引条目提供一个顺序号。
从当前索引条目中提取块id并在数组中搜索匹配的块ID。
如果你找到匹配的,就用当前的顺序号更新数组的顺序号;
如果没有找到匹配项,那么用当前块ID和顺序号替换数组中顺序号最小的记录,并增加clustering_factor的计数。
/* 创建测试用表 */ /* 为了构造一个行数为10行,但会占用多个数据块的表,所以,我们把其中一个列C1,用长度为2000个字节的字符填充。同时,ID列中的值也是随机的,非顺序的。*/ SQL> create table tmp_cf_4 as select ceil(dbms_random.value(1,10)) id,lpad('x',2000,'x') c1 from dual connect by rownum<=10; Table created. /* 在ID列上创建索引 */ SQL> create index ind_tmp_cf_4_id on tmp_cf_4(id) tablespace demo; Index created. /* 查看此时该索引的聚簇因子值 */ SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_4' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_TMP_CF_4_ID 10 8 /* 设置测试表TMP_CF_4上的TABLE_CACHED_BLOCKS的参数为3后,收集索引上的统计信息,并查看其聚簇因子的值*/ SQL> exec dbms_stats.set_table_prefs('DEMO','TMP_CF_4','TABLE_CACHED_BLOCKS',3); PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_index_stats('DEMO','IND_TMP_CF_4_ID'); PL/SQL procedure successfully completed. SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_4' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_TMP_CF_4_ID 10 5 /* 此时,我们可以看到其聚簇因子的值由此前的8变小到了5. */ /* 下面,我们来看一下,这个新值是如何计算出来的? */ /* 按ID列在索引中的顺序显示其值,顺序号以及对应行所在的块号 */ SELECT ROWNUM RN ,ID ,BLOCK_NUM FROM (SELECT ID ,DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) BLOCK_NUM FROM TMP_CF_4 ORDER BY ID ,ROWID); RN ID BLOCK_NUM ---------- ---------- ---------- 1 2 4859 2 3 4862 3 4 4860 4 5 4861 5 6 4859 6 6 4859 7 6 4861 8 8 4860 9 8 4860 10 10 4861 10 rows selected. /* 打开一个EXCEL表,模拟一个存储3行记录的数组,并计算聚簇因子的值 */ RN ID BLOCK_NUM 块ID顺序号 ---------- ---------- ---------- 1 2 4859 104861 2 3 4862 64859 3 4 4860 94860 4 5 4861 5 6 4859 6 6 4859 7 6 4861 8 8 4860 CF5 9 8 4860 10 10 4861
3.3、设置set_index_stats的聚簇因子值clstfct
exec dbms_stats.set_index_stats
('DEMO', #索引的属主
'IND_TAB_CACHED_1_ID', #索引名
clstfct=>100 #聚簇因子值
);
注:设置后,要把该索引所在的基表统计信息加锁
(dbms_stats.lock_table_stats)。以避免下次收集统计时,被新值覆盖。(dbms_stats包无单独针对索引统计信息加锁的方法)
/* 创建测试数据 */ /* 为了制造出聚簇因子较高的索引,对生成的ID数值取随机数1至1000。*/ SQL> create table tmp_cf_5 as select ceil(dbms_random.value*1000) id,'a' name from dual connect by rownum<=1000; Table created. /* 在ID列上创建索引 */ SQL> create index ind_tmp_cf_5_id on tmp_cf_5(id); Index created. /* 收集统计信息 */ SQL> exec dbms_stats.gather_table_stats('demo','tmp_cf_5'); PL/SQL procedure successfully completed. /* 开启autotrace查看执行计划 */ SQL> select * from tmp_cf_5 where id<=6; Execution Plan ---------------------------------------------------------- Plan hash value: 4131497868 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 7 | 42 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF_5 | 7 | 42 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=6) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 668 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 5 rows processed 当前的索引的聚簇因子 SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_5' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_TMP_CF_5_ID 1000 438 修改当前的聚簇因子值 SQL> exec dbms_stats.set_index_stats('DEMO','IND_TMP_CF_5_ID',clstfct=>10); PL/SQL procedure successfully completed. SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_5' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_TMP_CF_5_ID 1000 10 再次查看执行计划 SQL> select * from tmp_cf_5 where id<=6; Execution Plan ---------------------------------------------------------- Plan hash value: 4231491781 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 7 | 42 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF_5 | 7 | 42 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_TMP_CF_5_ID | 7 | | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=6) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 668 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 5 rows processed
/* 从上面可见,此时已经采用了先索引再回表的访问路径,说明此时其计算出来的成本,是小于全表扫描的。*/
到此,关于“什么是Clustering Factor”的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注亿速云网站,小编会继续努力为大家带来更多实用的文章!
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。