Today one customer said the data sync of Oracle Dataguard had issue and our guys checked the alert log then sent below errors to me:
ORA-01034: ORACLE not available ORA-27102: out of memory Linux-x86_64 Error: 12: Cannot allocate memory
Also had some ORA-00313 errors about failure to open data/redo files.
While he also confirmed the instance was opened and was running.
I asked to check the memory usage, detail trace log and system message log file continually.
The system has 200G physical memory and 16G swap while at that time we only got 3M free swap! so we did get memory issue.
Our guys checked the processes using most of the swap and killed some of them while did not fix it as only very small part of swap was freed.
I got more detailed memory info through /proc/meminfo:
[root@erpdg ~]# cat /proc/meminfo MemTotal: 206352512 kB MemFree: 1260976 kB MemAvailable: 520844 kB Buffers: 3160 kB Cached: 41799796 kB SwapCached: 106432 kB Active: 40449436 kB Inactive: 3372480 kB ...... SwapTotal: 16777212 kB SwapFree: 108580 kB ..... HugePages_Total: 72000 HugePages_Free: 71984 HugePages_Rsvd: 113 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 323520 kB DirectMap2M: 24842240 kB DirectMap1G: 186646528 kB
If you read the above output carefully, you would already know the reason.
Then I asked our guys to do below actions:
1. Check the SGA and Memory target settings 2. Stop the database instance 3. sync; echo 3 > /proc/sys/vm/drop_caches 4. start the database instance again 5. cat /proc/meminfo again
The SGA was about 120G, and the new content of /proc/meminfo was as below:
[root@erpdg ~]# cat /proc/meminfo MemTotal: 206352512 kB MemFree: 44060916 kB MemAvailable: 43266384 kB Buffers: 4192 kB Cached: 1122640 kB SwapCached: 39268 kB ...... SwapTotal: 16777212 kB SwapFree: 15826640 kB ...... HugePages_Total: 72000 HugePages_Free: 10672 HugePages_Rsvd: 113 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 323520 kB DirectMap2M: 24842240 kB DirectMap1G: 186646528 kB
So this issue was caused by the HugePage memory.
For some reasons, last time when the database instances was started, it did not make use of HugePage memory as there were about 140G free HugePage memory. After the above actions now we had about 20G HugePage memory. It was correct as the SGA size was about 120G.