RAC 相关的等待及原因
Wait events for Oracle RAC include the following categories:
1.Block-Related Wait Events
2.Message-Related Wait Events
3.Contention-Related Wait Events
4.Load-Related Wait Events
Block-Related Wait Events
The main wait events for block-related waits are:
gc current block 2-way
gc current block 3-way
gc cr block 2-way
gc cr block 3-way
*******************
在node1的cache里面没有找到需要的数据,于是出现了跨node的 fusion cache 从node2 的Cache中得到数据。
在gc current block 2-way or gc current block 3-way等待事件上的过多等待,通常要么是由于(a)一种低效的执行计划,导致了大量的块访问,
或者(b)应用数据相似度(应用亲和力)没有被实施。如果对象访问本地化,考虑实施应用亲和力(应用数据的相似度)。
*********************
The block-related wait event statistics indicate that a block was received as either the result of a 2-way or a 3-way message, that is, the block was sent from either the resource master requiring 1 message and 1 transfer, or was forwarded to a third node from which it was sent, requiring 2 messages and 1 block transfer.
If the average wait times are acceptable and no interconnect or load issues can be diagnosed, then the accumulated time waited can usually be attributed to a few SQL statements which need to be tuned to minimize the number of blocks accessed.
The column CLUSTER_WAIT_TIME in V$SQLAREA represents the wait time incurred by individual SQL statements for global cache events and will identify the SQL which may need to be tuned.
Message-Related Wait Events
The main wait events for message-related waits are:
gc current grant 2-way
gc cr grant 2-way
*********************
如果被请求的块没有驻留在任何缓冲区中,需要请求master读取物理磁盘上的数据,出现物理读写。就会遭遇gc cr grant 2-way 和 gc current grant 2-way等待事件。
*********************
The message-related wait event statistics indicate that no block was received because it was not cached in any instance. Instead a global grant was given, enabling the requesting instance to read the block from disk or modify it.
If the time consumed by these events is high, then it may be assumed that the frequently used SQL causes a lot of disk I/O (in the event of the cr grant) or that the workload inserts a lot of data and needs to find and format new blocks frequently (in the event of the current grant).
Contention-Related Wait Events
The main wait events for contention-related waits are:
gc current block busy
gc cr block busy
gc buffer busy acquire/release
******************
一般是并发的读写,各个session 中间出现资源竞争,需要等其他session 把修改的数据写入redo log,才会把控制权返回给其他session。如果一个竞争没有结束,再有其他的竞争增加,会出现雪崩的效应,系统性能急剧下降。
繁忙事件(Busy events)表明,LMS执行了额外的工作去处理并发相关的问题。
******************
The contention-related wait event statistics indicate that a block was received which was pinned by a session on another node, was deferred because a change had not yet been flushed to disk or because of high concurrency, and therefore could not be shipped immediately. A buffer may also be busy locally when a session has already initiated a cache fusion operation and is waiting for its completion when another session on the same node is trying to read or modify the same data. High service times for blocks exchanged in the global cache may exacerbate the contention, which can be caused by frequent concurrent read and write accesses to the same data.
Load-Related Wait Events
The main wait events for load-related waits are:
gc current block congested
gc cr block congested
***************
如果LMS进程在接收到请求后没有在1毫秒内处理该请求,那么LMS进程标记这个响应为:该块正遭遇拥堵相关的等待事件。
堵塞相关的等待事件有很多原因,比如说,LMS进程被大量全局高速缓存的请求所淹没。LMS进程正遭遇CPU的调度延迟,LMS进程已经遇到了另一种资源耗尽(如内存)等。
通常情况下,LMS进程运行在实时CPU调度优先级,因此,CPU调度的延迟将是最小的。大量这类的等待此事件表明出现了全局缓存请求的突然飙升,且LMS进程无法快速处理这些请求。
服务器内存匮乏也可能导致LMS进程的分页,影响全局缓存的性能。
您可以去检查为什么LMS进程不能够有效地处理请求。
就是硬件不足需要增加硬件资源,最常见的只增加node。也可以考虑升级硬件。
**************
The load-related wait events indicate that a delay in processing has occurred in the GCS, which is usually caused by high load, CPU saturation and would have to be solved by additional CPUs, load-balancing, off loading processing to different times or a new cluster node.For the events mentioned, the wait time encompasses the entire round trip from the time a session starts to wait after initiating a block request until the block arrives.