Thursday, February 12, 2015

Node Manager : High Availability & Crash Recovery

If you want to have a highly available WLS-environment, it is a very good practice to use the Node Manager (NM).  I have done some tests and I will share here my findings.

Note:
This blog post just focuses on the recovery aspects of the Node Manager and will not handle the basic issues.

On a Linux machine, I created a domain with the following components:

  • AdminServer : started through the "startWebLogic.sh" script
  • One single machine + NM configuration : NM started through the "startNodeManager.sh" script
  • One managed server : started through the NM in the console

Search the process id of the managed server through the command line, and execute the following command:
kill <<pid_managed_server>>
Result:
No automatically restart of the managed server.

I restarted the server through the WLS console, and I executed this command:
kill -9 <<pid_managed_server>>
Result:
Auto-restart of the managed server OK!

I did a cold restart of the server machine, and after the reboot, I started back the NM.
Result:
No automatically restart of the managed server.  I restarted manually the server through the WLS console.

To solve this (= enable the crash recovery), the parameter "CrashRecoveryEnabled" must be changed from "false" to "true" in the file "$WL_HOME/common/nodemanager/nodemanager.properties".  Thereafter, restart the NM.

Now, again I did a cold restart of the server machine, started back the NM.
Result:
Auto-restart (= recovery) of the managed server OK!



Conclusion:
It is a good practice to start automatically the NM when booting the server machine.  This will bring automatically the servers under the NM-control in the original state.  That means, if your server process was down, when your machine crashed, it will not be recovered.  If it was up and running, NM will restart your process.

No comments:

Post a Comment