Oracle 11gR2 Grid Infrastructure Services - Part I


Discussion about startup sequence and service features of Grid InfraStructure Services

This is a three part series which discusses exhaustively about 11gR2 GRID Infrastructure Services.

Assuming that we are discussing about a unix like operating system or a linux. Once the OS finishes the bootstrap process, it reads the /etc/initab file
via the initialization daemon init.d

At the end of the file "/etc/inittab" is a line, which looks something like this
"h1:35:respawn:/etc/init.d/init.ohasd run > /dev/null 2>&1 </dev/null"
the above line inidicates the initialization daemon to trigger the Oracle High Availability Services Daemon.

OHASD is of two types

- OHASD for cluster(also known as crs)
- OHASD for non-rac environments(also known as has) ; has - high availability service

Oracle Restart is bundled with GI software/Clusterware which is usually installed as part of non-rac environment installation



OHASD starts 3 components

- CSSDAgent - cssd daemon - ora.ocssd - heartbeat and node-membership
- OraAgent - non-privileged activities 
- OraRootAgent - for privileged activities - ora.cssd.monitor | ora.crsd | ora.ctssd





Cluster Time Synchronization Services

- Replaces the traditional operating system provided utilities like ntpd.(used in 10g crs and 11gr1 crs)
- they synchronize the time between the clustered nodes.
- ora.ctssd by default is a key component and it will operate be it you use ntpd or you do not use ntpd.
- It operates in OBSERVER mode if the underlying ntpd process is running on the clustered nodes. it operates in ACTIVE mode if the underlying ntpd
process is not configured to you.
- In ACTIVE mode it takes charge of cluster time synchronization.
- In OBSERVER mode it acts an observer of the time skew of the ntpd process
- To verify who is responsible for cluster time synchronization, run the below command

Execute the below command to get the list of all services.

crsctl stat res -t -init
ctssd process running in observer mode



cluvfy comp clocksync -n all -verbose

Execute the above command to verify the clock synchronization across the cluster nodes and see that what ctssd is configured as and you will also get the information about ntpd if it is configured with correct slewing options and the UDP ports which are required to be open on all the cluster nodes for communicating information.

Cluster Ready Service (CRS)

crsd starts both privileged and not so privileged cluster application resources. Therefore it starts two agents oraagent and orarootagent.



- Responsible for START/STOP/FAILOVER of a resource
- Maintains OCR and also restarts the resources when the failure occurs. This is applicable for RAC systems. And for Oracle Restart/ASM or non-rac Environments ohasd is used. 
- Example for a Fault Tolerance:- Let us suppose you give a shut abort on the ASM instance manually, the database associated with this ASM also gets terminated because the ASMB process which acts a bridge between these two instances is terminated, when a critical process is not running, oracle instance no longer exists. when this happens, crsd will detect that the failure and will immediately bring up the ASM and the database processes without any Human Intervention.




Cluster Synchronization Service (CSS)

There are 3 key processes for css as listed below.




CSS related processes


- cssdmonitor  monitors the node hangs(takes over oprocd functionality which was introduced in 10g crs and oprocd is obsolete in 11gR2)


Oracle Process Monitor Daemon (OPROCD) - This process (the OraFenceService service in Windows) is locked in memory to monitor the cluster and provide I/O Fencing. The OPROCD periodically wakes up and checks the interval since it last awoke is within the expected time. If not, then OPROCD resets the processor and restarts the node. The OPROCD failure results in Oracle Clusterware restarting the node, which means the node is re-booting itself.




OCLSOMON - A process that runs on nodes in Oracle RAC environments. OCLSOMON monitors the CSS hangs(ocssd.bin process) that result from load or scheduling issues. If OCLSOMON detects either of these, the process shuts down the affected node to prevent database correction. And this will evict the node out of the cluster which essentially means the working node has thrown the problematic node out of the cluster and as a result of this it will be RE-STARTED or a reboot has been triggered.


OPROCD and OCLSOMON functionality is taken over by CSSDMonitor and they no longer exist in 11gR2


I/O Fencing is the technique of isolation of a malfunctioning node from a cluster's shared storage to protect the integrity of data. Evicting the node or rebooting the node helps in providing the I/O Fencing.


Reboot of the node must happen as quickly as possible. Choosing not to flush the local disks will quicken the shutting down of the cluster node. It is imperative that we do not flush any IO to the shared disks. Else, it may write irrelevant information to the Clusterware components(OCR or Voting Disk) or to ASM(Database files).


The above 3 processes are FATAL Processes. i.e. processes whose abnormal halt or kill will provoke a node reboot.


Other non-crs processes capable of evicting/rebooting the node for providing I/O Fencing are:
* OCFS2 (if used)
* Vendor Clusterware (if used)
* Operating System (Panic)






- ocssd.bin runs as "oracle" process. Its primary job is internode health monitoring (via NodeMembership and GroupMembership services).




It is a multi-threaded application. i.e. several jobs or threads run simultaneously performing specific tasks. The ocssd.log reveals all the thread names like clssnm%, clscsc, clssgm% etc.




These threads perform heartbeat (network & disk) monitoring, send/receive cluster messages.
Evictions occur when CSS detects a heartbeat problem that must be dealt with. For example, lost network communication with another node(s) or lost disk hearbeat information from another node(s). CSS initated evictions(via poison packet or kill block) should always result in a node reboot.


The CSSD process may itself reboot the node, if at all some checks like disktimeout or missed checkins happen. It will reboot the node saying "Rebooting for Cluster Integrity"


Ideally when CPU starvation or Memory Starvation exists on the nodes, essentially the CSS set of processes like ocssd.bin may get affected which may trigger the OCLSOMON functionality of CSSMonitor to  eviction of the node from the cluster or if the OPROCD functionality of the CSSDMonitor is kicked in the node is re-starting or rebooting itself.


Always monitor "/var/log/messages" to see what has caused the node to get evicted out of the cluster.


Example of a Node getting evicted by some other node in the cluster.


Feb 18 17:20:42 db01 kernel: SysRq : Resetting
Feb 18 17:20:44 db01 kernel: printk: 6 messages suppressed.
Feb 18 17:20:44 db01 kernel: type=1701 audit(1392744044.855:28194): auid=4294967295 uid=1000 gid=1001 ses=4294967295 pid=8368 comm="ocssd.bin" sig=6
Feb 18 17:24:26 db01 syslogd 1.4.1: restart.
Feb 18 17:24:26 db01 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Feb 18 17:24:26 db01 kernel: Linux version 2.6.18-238.12.2.0.2.el5 (xyz@ca-build9.us.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50))


Whenever you notice "Kernel:SysRq:Resetting", it means that this node is evicted by some other node from cluster. This happens when the node reads its kill block once per second and the kill block is overwritten and the node commits suicide.


Example of a Node getting evicted by self-restart or self-remove due to CPU/Memory Pressures.(Possibly the CSSDMonitor via the OPROCD may have done this)


Feb 12 07:32:42 db02 kernel: Total swap = 25165816kB
Feb 12 07:32:42 db02 kernel: Free swap:        97972kB
Feb 12 07:32:42 db02 kernel: 25690112 pages of RAM
Feb 12 07:32:42 db02 kernel: 978057 reserved pages
Feb 12 07:32:42 db02 kernel: 175190262 pages shared
Feb 12 07:32:42 db02 kernel: 45130 pages swap cached
Feb 12 07:35:49 db02 xinetd[7315]: START: omni pid=8176 from=::ffff:10.77.9.254
Feb 12 07:57:57 db02 syslogd 1.4.1: restart.
Feb 12 07:57:57 db02 kernel: klogd 1.4.1, log source = /proc/kmsg started


In the above case, Kernel SysRq is not reported. This shows a case of suicide. so, we need not start looking from cluster side of events. There could be many possible reasons for a suicide but most common are high load on the node or memory pressures which will cause scheduling problems or CPU starvation. To investigate on this we need to take the help of "OS Watcher" and other utilities .
Especially the "top" command output during the system restart interval.


Example of a Node getting evicted due to Linux Kernel Bug


Linux Bugs would often cause system panic and the system may reboot itself, in this cases the scenario would be to find the bug details and get the downtime and have the OS upgraded or bug fixed accordingly.


Example of a Node getting evicted or rebooted due to Oracle CSSD failure


Eviction messages, clearly appear in occsd.log file. It may happen due to network heart beat time out or Disk heart beat time out.




crsctl get css misscount // Indicates the Network HeartBeat TimeOut






crsctl get css disktimeout  // Indicates the DiskTime out. (Voting Disk)














Thanks & Regards
Sai Marlakunta

Comments

Popular posts from this blog

DBMS_XPLAN.DISPLAY_CURSOR

IBM Mainframe

In-memory Column Store and Distribute for Service Options in Oracle 12cR2 Database