'racgmain check'에 해당되는 글 1건

  1. 2010.02.03 ragmain check 데몬의 비정상적인 증가

10g RAC 환경에서 있는 bug 인데 racgmain check 데몬이 비정상적으로 fork 되면서 메모리 사용율이 올라가게 되어 결국 나중엔 시스템을 사용할수 없는 지경까지 이르게 됨.
 
 oracle 26024     1  0  Dec  6  ?         0:00 /oracle/crs/bin/racgmain check
 oracle 23218     1  0  Dec  6  ?         0:00 /oracle/crs/bin/racgmain check
 oracle 23179     1  0  Dec  4  ?         0:00 /oracle/ora10/bin/racgmain check
 oracle 27277     1  0  Dec  6  ?         0:00 /oracle/ora10/bin/racgmain check
 oracle  1028     1  0  Dec  5  ?         0:00 /oracle/ora10/bin/racgmain check
 oracle  7991     1  0  Dec  4  ?         0:00 /oracle/ora10/bin/racgmain check
 oracle 15324     1  0  Dec  3  ?         0:00 /oracle/ora10/bin/racgmain check
 oracle 14314     1  0  Dec  4  ?         0:00 /oracle/ora10/bin/racgmain check
 oracle 10895     1  0  Dec  4  ?         0:00 /oracle/ora10/bin/racgmain check
 oracle   404     1  0  Dec  3  ?         0:00 /oracle/ora10/bin/racgmain check

해결책은 아래와 같이 CRS bundle #2 patchset을 적용시키거나 workaround 방법을 써서 조치해 주어야 함.
=====================================================================================

Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.1.0.6
Information in this document applies to any platform.
Oracle Server Enterprise Edition - Version: 10.1.0.2 to 10.2.0.4

Symptoms

System slows down and many "racgmain check" processes may appear in ps output.  CRS log would show the following messages.

oracle@HA5-ZW05:[/home/oracle] ps -ef|grep "racgmain check"|wc -l
1290

~~~~
CAAMonitorHandler :: 0:Action Script /opt/oracle/product/crs/bin/racgwrap(check) timed out for ora.harac1.vip! (timeout=60)
CheckResource error for ora.harac1.vip error code = -2
CAAMonitorHandler :: 0:Could not join /opt/oracle/product/crs/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0,
other: Abnormal termination of the child
~~~~

Cause

crsd.bin invokes the racgmain to check the status of the resources that are managed by CRS. The racgmain is invoked through the wrapper script racgwrap.

If the resource action timed out, crsd kills the action script, which is racgwrap, while racgmain process will not be killed. Over time, this might create lot of orphan racgmain processes in the system. This would eventually slow down the due to the resource contention at the OS level.

Internal bug:6196746  addresses this issue.

Solution


  • This is fixed in 11.1.0.7 patchset.. If you are running into this issue in 10gR2, please go ahead and apply 10.2.0.4 patchset and the latest CRS bundle patch. This fix is included  in CRS bundle patch from bundle #2 onwards.
  • Following option could be used as a temporary workaround until the patch is applied.

1.  Make a copy of racgwrap located under $ORACLE_HOME/bin and $CRS_HOME/bin on ALL Nodes

2.  Edit the file racgwrap and modify the last 3 lines from:

~~~
$ORACLE_HOME/bin/racgmain "$@"
status=$?
exit $status

to:

# Line added to fix for Bug 6196746
exec $ORACLE_HOME/bin/racgmain "$@"
~~~

3.  Kill all the orphan racgmain processes running.

$ ps -ef|grep "racgmain check"
oracle 18701 1 0 Aug 1 ? 0:00 /oracle/product/10.2.0/database/bin/racgmain check
oracle 14653 1 0 Aug 1 ? 0:00 /oracle/product/10.2.0/database/bin/racgmain check
oracle 24517 1 0 Aug 1 ? 0:00 /oracle/product/10.2.0/database/bin/racgmain check

$ kill -9 <PID of racgmain>
Posted by pat98
이전버튼 1 이전버튼

05-10 00:00
Flag Counter
Yesterday
Today
Total

글 보관함

최근에 올라온 글

달력

 « |  » 2024.5
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31

최근에 달린 댓글