To start/stop all the instances on a server, I created a script named dbss (database start/stop) and it worked well on AIX, RHEL5/6 and SLES11, while I found it could not work on RHEL7/SLES12 with systemd service.
When the server was started, the instances would be started without any issue, but when I rebooted the system, the instances were stopped abnormally. Such issue maybe is not a big problem for general applications, but for database I should fix it.
Last year I tried to fix this issue while failed and I just found it maybe was related with cgroup, but I could not figure out the root cause.
I thought one day I would learn RHEL7 from the beginning and fix it, and one year past I still did not handle it.
Last night my bro told me when he deployed dbss on SLES12, he got one issue that the started listener could not be accessed, and I believed this issues should be similar with the above one, so this time I had to face them directly.
I could create a systemd service with the user oracle and it worked well.
[root@olinux73 ~]# cat /etc/systemd/system/oradb.service [Unit] Description=Oracle Instance After=remote-fs.target Before=shutdown.target reboot.target halt.target [Service] Environment='ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1' Type=forking ExecStart=/u01/app/oracle/product/12.1.0/dbhome_1/bin/dbstart orcl ExecStop=/u01/app/oracle/product/12.1.0/dbhome_1/bin/dbshut orcl User=oracle Group=dba [Install] WantedBy=multi-user.target [root@olinux73 ~]# systemctl status oradb ● oradb.service - Oracle Instance Loaded: loaded (/etc/systemd/system/oradb.service; enabled; vendor preset: disabled) Active: active (running) since Sat 2018-06-09 16:38:38 NZST; 24s ago Process: 7756 ExecStart=/u01/app/oracle/product/12.1.0/dbhome_1/bin/dbstart orcl (code=exited, status=0/SUCCESS) CGroup: /system.slice/oradb.service ├─7854 ora_pmon_orcl ├─7856 ora_psp0_orcl ├─7858 ora_vktm_orcl ├─7862 ora_gen0_orcl
While if I created a systemd service with dbss script, I got different result:
[root@olinux73 ~]# cat /etc/systemd/system/dbss.service [Unit] Description=Oracle Instance and Listener After=remote-fs.target systemd-logind.service local-fs.target remote-fs.target Before=shutdown.target reboot.target halt.target [Service] Type=forking RemainAfterExit=yes ExecStart=/etc/init.d/dbss start ExecReload="/etc/init.d/dbss stop; /etc/init.d/dbss start" ExecStop=/etc/init.d/dbss stop SendSIGKILL=no TimeoutStopSec=0 KillMode=none [Install] WantedBy=multi-user.target [root@olinux73 ~]# systemctl status dbss ● dbss.service - Oracle Instance and Listener Loaded: loaded (/etc/systemd/system/dbss.service; enabled; vendor preset: disabled) Active: active (exited) since Sat 2018-06-09 16:42:11 NZST; 33s ago Process: 8336 ExecStart=/etc/init.d/dbss start (code=exited, status=0/SUCCESS) Main PID: 6319 (code=exited, status=0/SUCCESS)
If I added the pidfile to the config file, I would get below status output:
[root@olinux73 ~]# systemctl status dbss ● dbss.service - Oracle Instance and Listener Loaded: loaded (/etc/systemd/system/dbss.service; enabled; vendor preset: disabled) Active: active (running) since Sat 2018-06-09 16:29:40 NZST; 1min 10s ago Process: 6154 ExecStart=/etc/init.d/dbss start (code=exited, status=0/SUCCESS) Main PID: 6319 (ora_pmon_orcl) CGroup: /system.slice/dbss.service ‣ 6319 ora_pmon_orcl
And I found the instance was terminated:
2018-06-09 01:52:35.780000 +12:00 Performing implicit shutdown abort due to dead PMON Shutting down instance (abort) License high water mark = 12 USER (ospid: 21060): terminating the instance Instance terminated by USER, pid = 21060 Instance shutdown complete ORA-1092 : opitsk aborting process 2018-06-09 01:58:19.646000 +12:00 Starting ORACLE instance (normal) (OS id: 993) CLI notifier numLatches:3 maxDescs:519
I tested lots of options of systemd service but most of them failed until I found this webpage:
SAP Instances failed stop on shutdown (PACEMAKER, SYSTEMD, SAP)
The key point was this part:
In a Linux with systemd any application call with su - <somenameofsomeuser> will result in a move into a user slice. This is especially true in case of SAP Instances and Databases handled by the pacemaker cluster service.
I already found all the processes were under the user.slice when I started the systemd dbss service:
user.slice user-1000.slice session-c6.scope 3084 ora_pmon_orcl 3086 ora_psp0_orcl 3088 ora_vktm_orcl 3092 ora_gen0_orcl 3096 ora_mman_orcl 3098 ora_diag_orcl 3100 ora_dbrm_orcl 3102 ora_vkrm_orcl 3104 ora_dia0_orcl 3106 ora_dbw0_orcl 3108 ora_lgwr_orcl
While for systemd oradb service, they were under system.slice:
[root@olinux73 ~]# systemd-cgls 1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21 system.slice dbus.service 671 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation firewalld.service 698 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid lvm2-lvmetad.service 516 /usr/sbin/lvmetad -f postfix.service 1785 /usr/libexec/postfix/master -w 1794 pickup -l -t unix -u 1795 qmgr -l -t unix -u crond.service 700 /usr/sbin/crond -n oradb.service 955 ora_pmon_orcl 957 ora_psp0_orcl 959 ora_vktm_orcl 963 ora_gen0_orcl 965 ora_mman_orcl 969 ora_diag_orcl 971 ora_dbrm_orcl 973 ora_vkrm_orcl 975 ora_dia0_orcl 977 ora_dbw0_orcl
To start all the instances, the dbss has to be run as root user, and it will su to other users to start the instances, so the systemd dbss service could not be run with a special user like oracle.
Following the above webpage I modified the file /etc/pam.d/system-auth-ac:
[root@olinux73 ~]# cat /etc/pam.d/system-auth #%PAM-1.0 # This file is auto-generated. # User changes will be destroyed the next time authconfig is run. auth required pam_env.so auth sufficient pam_unix.so nullok try_first_pass auth requisite pam_succeed_if.so uid >= 1000 quiet_success auth required pam_deny.so account required pam_unix.so account sufficient pam_localuser.so account sufficient pam_succeed_if.so uid < 1000 quiet account required pam_permit.so password requisite pam_pwquality.so try_first_pass local_users_only retry=3 authtok_type= password sufficient pam_unix.so sha512 shadow nullok try_first_pass use_authtok password required pam_deny.so session optional pam_keyinit.so revoke session required pam_limits.so session [success=1 new_authtok_reqd=ok default=ignore] pam_listfile.so item=user sense=allow file=/etc/orausers --->add this line just before below line. -session optional pam_systemd.so session [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid session required pam_unix.so
And created a file /etc/orausers:
[root@olinux73 ~]# cat /etc/orausers oracle
Then I restart the systemd dbss service and all the processes began to run under system.slice:
[root@olinux73 ~]# systemd-cgls ├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21 ├─system.slice │ ├─dbus.service │ │ └─675 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation │ ├─firewalld.service │ │ └─695 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid │ ├─lvm2-lvmetad.service │ │ └─515 /usr/sbin/lvmetad -f │ ├─postfix.service │ │ ├─2259 /usr/libexec/postfix/master -w │ │ ├─2267 pickup -l -t unix -u │ │ └─2268 qmgr -l -t unix -u │ ├─crond.service │ │ └─700 /usr/sbin/crond -n │ ├─systemd-journald.service │ │ └─496 /usr/lib/systemd/systemd-journald │ ├─auditd.service │ │ └─644 /sbin/auditd -n │ ├─gssproxy.service │ │ └─673 /usr/sbin/gssproxy -D │ ├─dbss.service │ │ ├─ 759 /u01/app/oracle/product/12.1.0/dbhome_1/bin/tnslsnr LISTENER -inherit │ │ ├─1021 ora_pmon_orcl │ │ ├─1023 ora_psp0_orcl │ │ ├─1065 ora_vktm_orcl │ │ ├─1069 ora_gen0_orcl │ │ ├─1071 ora_mman_orcl │ │ ├─1075 ora_diag_orcl [root@olinux73 ~]# systemctl status dbss ● dbss.service - Oracle Instance and Listener Loaded: loaded (/etc/systemd/system/dbss.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2018-06-10 00:07:38 NZST; 45min ago Process: 693 ExecStart=/etc/init.d/dbss start (code=exited, status=0/SUCCESS) CGroup: /system.slice/dbss.service ├─ 759 /u01/app/oracle/product/12.1.0/dbhome_1/bin/tnslsnr LISTENER -inherit ├─1021 ora_pmon_orcl ├─1023 ora_psp0_orcl ├─1065 ora_vktm_orcl ├─1069 ora_gen0_orcl ├─1071 ora_mman_orcl ├─1075 ora_diag_orcl
Then I rebooted the system and checked the alert log of the instance:
2018-06-10 00:06:40.594000 +12:00 Shutting down instance (immediate) Stopping background process SMCO 2018-06-10 00:06:41.600000 +12:00 Shutting down instance: further logons disabled Stopping background process CJQ0 Stopping background process MMNL 2018-06-10 00:06:42.652000 +12:00 Stopping background process MMON License high water mark = 2 2018-06-10 00:06:44.923000 +12:00 Waiting for dispatcher 'D000' to shutdown 2018-06-10 00:06:48.213000 +12:00 All dispatchers and shared servers shutdown ALTER DATABASE CLOSE NORMAL SMON: disabling tx recovery Stopping Emon pool Stopping Emon pool SMON: disabling cache recovery Shutting down archive processes Archiving is disabled Thread 1 closed at log sequence 33 Successful close of redo thread 1 Completed: ALTER DATABASE CLOSE NORMAL
I got big help also from below command:
[root@olinux73 ~]# journalctl -b-1 -u dbss .............................. Jun 10 00:06:57 olinux73.dbcloudsvc.com su[12076]: pam_unix(su:session): session opened for user oracle by (uid=0) Jun 10 00:06:59 olinux73.dbcloudsvc.com su[12100]: (to oracle) root on none Jun 10 00:06:59 olinux73.dbcloudsvc.com su[12100]: pam_unix(su-l:session): session opened for user oracle by (uid=0) Jun 10 00:06:59 olinux73.dbcloudsvc.com su[12117]: (to oracle) root on none Jun 10 00:06:59 olinux73.dbcloudsvc.com su[12117]: pam_unix(su:session): session opened for user oracle by (uid=0) Jun 10 00:07:00 olinux73.dbcloudsvc.com su[12117]: pam_unix(su:session): session closed for user oracle Jun 10 00:07:00 olinux73.dbcloudsvc.com dbss[11130]: Stopping database orcl..........Done! Jun 10 00:07:00 olinux73.dbcloudsvc.com dbss[11130]: Databae orcl (/u01/app/oracle/product/12.1.0/dbhome_1) has been stopped successfully! Jun 10 00:07:00 olinux73.dbcloudsvc.com systemd[1]: Stopped Oracle Instance and Listener.
I found the issue was caused by pam_systemd when I checked the log at the beginning.
Then I updated the dbss script to do such changes automatically.
Great day!