Systemd service was terminated abnormally during reboot

NeilZhang
NeilZhang
管理员
140
文章
106.8千
浏览
Linux评论1,481字数 1512阅读5分2秒阅读模式

To start/stop all the instances on a server, I created a script named dbss (database start/stop) and it worked well on AIX, RHEL5/6 and SLES11, while I found it could not work on RHEL7/SLES12 with systemd service.

When the server was started, the instances would be started without any issue, but when I rebooted the system, the instances were stopped abnormally. Such issue maybe is not a big problem for general applications, but for database I should fix it.

Last year I tried to fix this issue while failed and I just found it maybe was related with cgroup, but I could not figure out the root cause.

I thought one day I would learn RHEL7 from the beginning and fix it, and one year past I still did not handle it.

Last night my bro told me when he deployed dbss on SLES12, he got one issue that the started listener could not be accessed, and I believed this issues should be similar with the above one, so this time I had to face them directly.

I could create a systemd service with the user oracle and it worked well.

  1. [root@olinux73 ~]# cat /etc/systemd/system/oradb.service
  2. [Unit]
  3. Description=Oracle Instance
  4. After=remote-fs.target
  5. Before=shutdown.target reboot.target halt.target
  6.  
  7. [Service]
  8. Environment='ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1'
  9. Type=forking
  10. ExecStart=/u01/app/oracle/product/12.1.0/dbhome_1/bin/dbstart orcl
  11. ExecStop=/u01/app/oracle/product/12.1.0/dbhome_1/bin/dbshut orcl
  12. User=oracle
  13. Group=dba
  14.  
  15. [Install]
  16. WantedBy=multi-user.target
  17. [root@olinux73 ~]# systemctl status oradb
  18. oradb.service - Oracle Instance
  19. Loaded: loaded (/etc/systemd/system/oradb.service; enabled; vendor preset: disabled)
  20. Active: active (running) since Sat 2018-06-09 16:38:38 NZST; 24s ago
  21. Process: 7756 ExecStart=/u01/app/oracle/product/12.1.0/dbhome_1/bin/dbstart orcl (code=exited, status=0/SUCCESS)
  22. CGroup: /system.slice/oradb.service
  23. ├─7854 ora_pmon_orcl
  24. ├─7856 ora_psp0_orcl
  25. ├─7858 ora_vktm_orcl
  26. ├─7862 ora_gen0_orcl

While if I created a systemd service with dbss script, I got different result:

  1. [root@olinux73 ~]# cat /etc/systemd/system/dbss.service
  2. [Unit]
  3. Description=Oracle Instance and Listener
  4. After=remote-fs.target systemd-logind.service local-fs.target remote-fs.target
  5. Before=shutdown.target reboot.target halt.target
  6.  
  7. [Service]
  8. Type=forking
  9. RemainAfterExit=yes
  10. ExecStart=/etc/init.d/dbss start
  11. ExecReload="/etc/init.d/dbss stop; /etc/init.d/dbss start"
  12. ExecStop=/etc/init.d/dbss stop
  13. SendSIGKILL=no
  14. TimeoutStopSec=0
  15. KillMode=none
  16.  
  17. [Install]
  18. WantedBy=multi-user.target
  19. [root@olinux73 ~]# systemctl status dbss
  20. dbss.service - Oracle Instance and Listener
  21. Loaded: loaded (/etc/systemd/system/dbss.service; enabled; vendor preset: disabled)
  22. Active: active (exited) since Sat 2018-06-09 16:42:11 NZST; 33s ago
  23. Process: 8336 ExecStart=/etc/init.d/dbss start (code=exited, status=0/SUCCESS)
  24. Main PID: 6319 (code=exited, status=0/SUCCESS)

If I added the pidfile to the config file, I would get below status output:

  1. [root@olinux73 ~]# systemctl status dbss
  2. dbss.service - Oracle Instance and Listener
  3. Loaded: loaded (/etc/systemd/system/dbss.service; enabled; vendor preset: disabled)
  4. Active: active (running) since Sat 2018-06-09 16:29:40 NZST; 1min 10s ago
  5. Process: 6154 ExecStart=/etc/init.d/dbss start (code=exited, status=0/SUCCESS)
  6. Main PID: 6319 (ora_pmon_orcl)
  7. CGroup: /system.slice/dbss.service
  8. 6319 ora_pmon_orcl

And I found the instance was terminated:

  1. 2018-06-09 01:52:35.780000 +12:00
  2. Performing implicit shutdown abort due to dead PMON
  3. Shutting down instance (abort)
  4. License high water mark = 12
  5. USER (ospid: 21060): terminating the instance
  6. Instance terminated by USER, pid = 21060
  7. Instance shutdown complete
  8. ORA-1092 : opitsk aborting process
  9. 2018-06-09 01:58:19.646000 +12:00
  10. Starting ORACLE instance (normal) (OS id: 993)
  11. CLI notifier numLatches:3 maxDescs:519

I tested lots of options of systemd service but most of them failed until I found this webpage:

SAP Instances failed stop on shutdown (PACEMAKER, SYSTEMD, SAP)

The key point was this part:

  1. In a Linux with systemd any application call with
  2. su - <somenameofsomeuser>
  3. will result in a move into a user slice. This is especially true in case of SAP Instances and Databases handled by the pacemaker cluster service.

I already found all the processes were under the user.slice when I started the systemd dbss service:

  1. user.slice
  2. user-1000.slice
  3. session-c6.scope
  4. 3084 ora_pmon_orcl
  5. 3086 ora_psp0_orcl
  6. 3088 ora_vktm_orcl
  7. 3092 ora_gen0_orcl
  8. 3096 ora_mman_orcl
  9. 3098 ora_diag_orcl
  10. 3100 ora_dbrm_orcl
  11. 3102 ora_vkrm_orcl
  12. 3104 ora_dia0_orcl
  13. 3106 ora_dbw0_orcl
  14. 3108 ora_lgwr_orcl

While for systemd oradb service, they were under system.slice:

  1. [root@olinux73 ~]# systemd-cgls
  2. 1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
  3. system.slice
  4. dbus.service
  5. 671 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
  6. firewalld.service
  7. 698 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
  8. lvm2-lvmetad.service
  9. 516 /usr/sbin/lvmetad -f
  10. postfix.service
  11. 1785 /usr/libexec/postfix/master -w
  12. 1794 pickup -l -t unix -u
  13. 1795 qmgr -l -t unix -u
  14. crond.service
  15. 700 /usr/sbin/crond -n
  16. oradb.service
  17. 955 ora_pmon_orcl
  18. 957 ora_psp0_orcl
  19. 959 ora_vktm_orcl
  20. 963 ora_gen0_orcl
  21. 965 ora_mman_orcl
  22. 969 ora_diag_orcl
  23. 971 ora_dbrm_orcl
  24. 973 ora_vkrm_orcl
  25. 975 ora_dia0_orcl
  26. 977 ora_dbw0_orcl

To start all the instances, the dbss has to be run as root user, and it will su to other users to start the instances, so the systemd dbss service could not be run with a special user like oracle.

Following the above webpage I modified the file /etc/pam.d/system-auth-ac:

  1. [root@olinux73 ~]# cat /etc/pam.d/system-auth
  2. #%PAM-1.0
  3. # This file is auto-generated.
  4. # User changes will be destroyed the next time authconfig is run.
  5. auth required pam_env.so
  6. auth sufficient pam_unix.so nullok try_first_pass
  7. auth requisite pam_succeed_if.so uid >= 1000 quiet_success
  8. auth required pam_deny.so
  9.  
  10. account required pam_unix.so
  11. account sufficient pam_localuser.so
  12. account sufficient pam_succeed_if.so uid < 1000 quiet
  13. account required pam_permit.so
  14.  
  15. password requisite pam_pwquality.so try_first_pass local_users_only retry=3 authtok_type=
  16. password sufficient pam_unix.so sha512 shadow nullok try_first_pass use_authtok
  17. password required pam_deny.so
  18.  
  19. session optional pam_keyinit.so revoke
  20. session required pam_limits.so
  21. session [success=1 new_authtok_reqd=ok default=ignore] pam_listfile.so item=user sense=allow file=/etc/orausers --->add this line just before below line.
  22. -session optional pam_systemd.so
  23. session [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
  24. session required pam_unix.so

And created a file /etc/orausers:

  1. [root@olinux73 ~]# cat /etc/orausers
  2. oracle

Then I restart the systemd dbss service and all the processes began to run under system.slice:

  1. [root@olinux73 ~]# systemd-cgls
  2. ├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
  3. ├─system.slice
  4. ├─dbus.service
  5. └─675 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
  6. ├─firewalld.service
  7. └─695 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
  8. ├─lvm2-lvmetad.service
  9. └─515 /usr/sbin/lvmetad -f
  10. ├─postfix.service
  11. ├─2259 /usr/libexec/postfix/master -w
  12. ├─2267 pickup -l -t unix -u
  13. └─2268 qmgr -l -t unix -u
  14. ├─crond.service
  15. └─700 /usr/sbin/crond -n
  16. ├─systemd-journald.service
  17. └─496 /usr/lib/systemd/systemd-journald
  18. ├─auditd.service
  19. └─644 /sbin/auditd -n
  20. ├─gssproxy.service
  21. └─673 /usr/sbin/gssproxy -D
  22. ├─dbss.service
  23. ├─ 759 /u01/app/oracle/product/12.1.0/dbhome_1/bin/tnslsnr LISTENER -inherit
  24. ├─1021 ora_pmon_orcl
  25. ├─1023 ora_psp0_orcl
  26. ├─1065 ora_vktm_orcl
  27. ├─1069 ora_gen0_orcl
  28. ├─1071 ora_mman_orcl
  29. ├─1075 ora_diag_orcl
  30. [root@olinux73 ~]# systemctl status dbss
  31. dbss.service - Oracle Instance and Listener
  32. Loaded: loaded (/etc/systemd/system/dbss.service; enabled; vendor preset: disabled)
  33. Active: active (running) since Sun 2018-06-10 00:07:38 NZST; 45min ago
  34. Process: 693 ExecStart=/etc/init.d/dbss start (code=exited, status=0/SUCCESS)
  35. CGroup: /system.slice/dbss.service
  36. ├─ 759 /u01/app/oracle/product/12.1.0/dbhome_1/bin/tnslsnr LISTENER -inherit
  37. ├─1021 ora_pmon_orcl
  38. ├─1023 ora_psp0_orcl
  39. ├─1065 ora_vktm_orcl
  40. ├─1069 ora_gen0_orcl
  41. ├─1071 ora_mman_orcl
  42. ├─1075 ora_diag_orcl

Then I rebooted the system and checked the alert log of the instance:

  1. 2018-06-10 00:06:40.594000 +12:00
  2. Shutting down instance (immediate)
  3. Stopping background process SMCO
  4. 2018-06-10 00:06:41.600000 +12:00
  5. Shutting down instance: further logons disabled
  6. Stopping background process CJQ0
  7. Stopping background process MMNL
  8. 2018-06-10 00:06:42.652000 +12:00
  9. Stopping background process MMON
  10. License high water mark = 2
  11. 2018-06-10 00:06:44.923000 +12:00
  12. Waiting for dispatcher 'D000' to shutdown
  13. 2018-06-10 00:06:48.213000 +12:00
  14. All dispatchers and shared servers shutdown
  15. ALTER DATABASE CLOSE NORMAL
  16. SMON: disabling tx recovery
  17. Stopping Emon pool
  18. Stopping Emon pool
  19. SMON: disabling cache recovery
  20. Shutting down archive processes
  21. Archiving is disabled
  22. Thread 1 closed at log sequence 33
  23. Successful close of redo thread 1
  24. Completed: ALTER DATABASE CLOSE NORMAL

I got big help also from below command:

  1. [root@olinux73 ~]# journalctl -b-1 -u dbss
  2. ..............................
  3. Jun 10 00:06:57 olinux73.dbcloudsvc.com su[12076]: pam_unix(su:session): session opened for user oracle by (uid=0)
  4. Jun 10 00:06:59 olinux73.dbcloudsvc.com su[12100]: (to oracle) root on none
  5. Jun 10 00:06:59 olinux73.dbcloudsvc.com su[12100]: pam_unix(su-l:session): session opened for user oracle by (uid=0)
  6. Jun 10 00:06:59 olinux73.dbcloudsvc.com su[12117]: (to oracle) root on none
  7. Jun 10 00:06:59 olinux73.dbcloudsvc.com su[12117]: pam_unix(su:session): session opened for user oracle by (uid=0)
  8. Jun 10 00:07:00 olinux73.dbcloudsvc.com su[12117]: pam_unix(su:session): session closed for user oracle
  9. Jun 10 00:07:00 olinux73.dbcloudsvc.com dbss[11130]: Stopping database orcl..........Done!
  10. Jun 10 00:07:00 olinux73.dbcloudsvc.com dbss[11130]: Databae orcl (/u01/app/oracle/product/12.1.0/dbhome_1) has been stopped successfully!
  11. Jun 10 00:07:00 olinux73.dbcloudsvc.com systemd[1]: Stopped Oracle Instance and Listener.

I found the issue was caused by pam_systemd when I checked the log at the beginning.

Then I updated the dbss script to do such changes automatically. Systemd service was terminated abnormally during reboot

Great day!

 
  • 本文由 NeilZhang 发表于10/06/2018 01:02:23
  • Repost please keep this link: https://www.dbcloudsvc.com/blogs/linux/systemd-service-was-terminated-abnormally-during-reboot/
匿名

发表评论

匿名网友
:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
确定