Today one customer said the data sync of Oracle Dataguard had issue and our guys checked the alert log then sent below errors to me:
- ORA-01034: ORACLE not available
- ORA-27102: out of memory
- Linux-x86_64 Error: 12: Cannot allocate memory
Also had some ORA-00313 errors about failure to open data/redo files.
While he also confirmed the instance was opened and was running.
I asked to check the memory usage, detail trace log and system message log file continually.
The system has 200G physical memory and 16G swap while at that time we only got 3M free swap! so we did get memory issue.
Our guys checked the processes using most of the swap and killed some of them while did not fix it as only very small part of swap was freed.
I got more detailed memory info through /proc/meminfo:
- [root@erpdg ~]# cat /proc/meminfo
- MemTotal: 206352512 kB
- MemFree: 1260976 kB
- MemAvailable: 520844 kB
- Buffers: 3160 kB
- Cached: 41799796 kB
- SwapCached: 106432 kB
- Active: 40449436 kB
- Inactive: 3372480 kB
- ......
- SwapTotal: 16777212 kB
- SwapFree: 108580 kB
- .....
- HugePages_Total: 72000
- HugePages_Free: 71984
- HugePages_Rsvd: 113
- HugePages_Surp: 0
- Hugepagesize: 2048 kB
- DirectMap4k: 323520 kB
- DirectMap2M: 24842240 kB
- DirectMap1G: 186646528 kB
If you read the above output carefully, you would already know the reason.
Then I asked our guys to do below actions:
- 1. Check the SGA and Memory target settings
- 2. Stop the database instance
- 3. sync; echo 3 > /proc/sys/vm/drop_caches
- 4. start the database instance again
- 5. cat /proc/meminfo again
The SGA was about 120G, and the new content of /proc/meminfo was as below:
- [root@erpdg ~]# cat /proc/meminfo
- MemTotal: 206352512 kB
- MemFree: 44060916 kB
- MemAvailable: 43266384 kB
- Buffers: 4192 kB
- Cached: 1122640 kB
- SwapCached: 39268 kB
- ......
- SwapTotal: 16777212 kB
- SwapFree: 15826640 kB
- ......
- HugePages_Total: 72000
- HugePages_Free: 10672
- HugePages_Rsvd: 113
- HugePages_Surp: 0
- Hugepagesize: 2048 kB
- DirectMap4k: 323520 kB
- DirectMap2M: 24842240 kB
- DirectMap1G: 186646528 kB
So this issue was caused by the HugePage memory.
For some reasons, last time when the database instances was started, it did not make use of HugePage memory as there were about 140G free HugePage memory. After the above actions now we had about 20G HugePage memory. It was correct as the SGA size was about 120G.