Power is a wonderful thing.. aka how to NOT restart a virtualized ODA

One of the reasons you put systems into a datacenter, is to have redundant power, so our precious servers never loose power.

No power, no server.

No servers, no database!

No databases, unhappy users armed with torches and pitchforks!

All that being said, you can hit an  interesting potential problem after restoring power to a virtualized ODA. After the reboot, none of your VMs are running and it  appears that all of your shared repositories are missing!

When you log into ODA_BASE check the repositories;

[root@oda1a bin]# ./oakcli show repo

NAME TYPE NODENUM STATE
odarepo1 local 0 N/A

odarepo2 local 1 N/A

See, no shared repositories!

The fix is simple, restart oak on both nodes, one at a time.

[root@oda1abin]# ./oakcli restart oak
Restarting the oakd..
Killing the running oakd with pid 13298
Successfully re-started the oakd..

and

[root@oda1b]# ./oakcli restart oak
Restarting the oakd..
Killing the running oakd with pid 13142
Successfully re-started the oakd..

Now, after restarting, check again;

[root@oda1abin]# ./oakcli show repo

NAME TYPE NODENUM STATE
odarepo1 local 0 N/A

odarepo2 local 1 N/A

repo0 shared 0 UNKNOWN

repo0 shared 1 ONLINE

 

You will see them coming back online. Wait a few minutes more and all will be good.

[root@oda1a bin]# ./oakcli show repo

NAME TYPE NODENUM STATE
odarepo1 local 0 N/A

odarepo2 local 1 N/A

repo0 shared 0 ONLINE

repo0 shared 1 ONLINE

What causes this is that when the  ODA is restarted,  the ASM instances are also restarted.  As ASM is mounting ACFS file-systems at boot,  oak is checking for the repositories.  Since the ACFS is not yet mounted, no shared repository is located by oak.  A quick fix to a simple problem. Now all you need to deal with are the pitch forks!

 

3 Replies to “Power is a wonderful thing.. aka how to NOT restart a virtualized ODA”

  1. The nodes on our ODA x5-2 went down due to an overtemp condition in our datacenter. The Air Handling issue was resolved, we got the nodes back on line with an Oracle SR assistance, but the oakd did not start on node 0 and node 1 became the master as verified with ‘oakcli show ismaster’

    With Oracle direction, manually did a ‘oakcli restart oak’ command on node 0 and it came back up, but in the Slave mode.

    I have a question in to Oracle via the SR, but will ask here also…will restarting the oak on node 1 force the Master state back on node 0? Or are the steps to flip this more involved?

    Thank you.

    • Generally the master should be the “0” server when available. The main thing to watch for, is you should always run the oakcli command to manage the ODA on the master. If you want to move it when “1” is the master, do a oakcli restart oakd on the “0” node an then the “1” node.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.