In last month, one engineer of our customers made a mistake and changed the file permissions of the whole RAC database wrongly, including GI and DB home.
Of course the clusterware could not be started again and I was called to fix this issue.
On the Oracle support website, below two notes are related with fixing GI file permission issues:
How to check and fix file permissions on Grid Infrastructure environment (Doc ID 1931142.1)
The first one is for Oracle 12.1&12.2 and the second one is only for 12.1.
Doc ID 1931142.1 mentioned three methods and I did all of them to make sure the file permissions were fixed thoughtfully.
#run as root (GRID_HOME is just for convenience): export GRID_HOME=/home/app/12.2.0.1/grid export PATH=$GRID_HOME/perl/bin:$PATH:$GRID_HOME/OPatch $GRID_HOME/bin/crsctl stop crs $GRID_HOME/crs/install/rootcrs.sh -unlock $GRID_HOME/crs/install/rootcrs.sh -lock #$GRID_HOME/crs/install/rootcrs.sh -patch -->12.1 should run this command and 12.2 does not support this option These above two options mostaly are used for RAC database patching, and the effect is like: $GRID_HOME/crs/install/rootcrs.sh -init for single instance GI (Oracle Restart): $GRID_HOME/crs/install/roothas.sh -init For 11.2,need to replace rootcrs.sh/roothas.sh with rootcrs.pl/roothas.pl As the default Perl in the system maybe is not compatiable with the Oracle perl script, so the recommended way to run them is as below: $GRID_HOME/perl/bin/perl rootcrs.pl
There is a bug for Oracle 12.1, so maybe have to run below part separately:
add x permission to following files under GI ORACLE_HOME # chmod +x $GRID_HOME/bin/crs* # chmod +x $GRID_HOME/crs/install/rootcrs.sh run rootcrs.sh # cd $GRID_HOME/crs/install # rootcrs.sh -patch
After the above fix, the clusterware could be started while the cluvfy command still showed lots of errors:
#run as grid user: cluvfy comp software -n $(hostname) -verbose
Next continued to fix left issues according to the kept permission files under $GRID_HOME/crs/utl/$(hostname):
cat crsconfig_dirs|grep -E '(^all|^unix)'|grep -v "$GRID_HOME/racg/usrco"|while read unused fname owner group permission; do chown $owner:$group $fname || echo failed on $fname chmod $permission $fname || echo failed on $fname done chmod 755 $GRID_HOME/racg/usrco cat crsconfig_fileperms|grep -E '(^all|^unix)'|while read unused fname owner group permission; do chown $owner:$group $fname || echo failed on $fname chmod $permission $fname || echo failed on $fname done
Then verified the permission issue again but still got some errors, so had to fix the last part according to the output result:
#run as grid user cluvfy comp software -n $(hostname) -verbose |grep 'PRVG-2033.*did not match the expected'|awk -F\" '{print $2" "$6}' > /tmp/grid.perm #run as root user: cat /tmp/grid.perm|while read fname permission; do chmod $permission $fname || echo failed on $fname done
This time the cluvfy command run successfully.
And double checked one critical file:
ls -l $GRID_HOME/bin/oracle -rwsr-s--x 1 grid oinstall 373913824 Dec 24 2019 /home/app/12.2.0.1/grid/bin/oracle #If the result is different, then correct it using root user: chown grid:oinstall $GRID_HOME/bin/oracle chmod 6751 $GRID_HOME/bin/oracle
Then went to fix the Oracle database home file permissions and I could not find similar ways to fix the permissions directly.
Oracle provided a perl script to duplicate permissions of a normal Oracle home and apply them on the target directory.
And below one for reference:
Oracle 11gR2 GI和DB安装目录权限属主被修改后的恢复方法
While the perl script did not work on the customer environment, so I finished the same thing using below commands:
#On the normal database, run as root: cd $ORACLE_HOME find . -type d -exec stat -c "%n %U %G %a" {} \; > /tmp/orahome.dir find . -type f -exec stat -c "%n %U %G %a" {} \; > /tmp/orahome.file #On the target database, run as root: cd $ORACLE_HOME cat /tmp/orahome.dir|while read fname owner group permission; do [[ -d $fname ]] && { chown $owner:$group $fname || echo failed on $fname; } [[ -d $fname ]] && { chmod $permission $fname || echo failed on $fname; } done cat /tmp/orahome.file|while read fname owner group permission; do [[ -f $fname ]] && { chown $owner:$group $fname || echo failed on $fname; } [[ -f $fname ]] && { chmod $permission $fname || echo failed on $fname; } done
And double checked below critical file:
ls -l $ORACLE_HOME/bin/oracle -rwsr-s--x 1 oracle asmadmin 409357968 May 27 07:18 /home/app/oracle/product/12.2.0.1/db1/bin/oracle #If it is different, then correct it using root user chown oracle:asmadmin $ORACLE_HOME/bin/oracle chmod 6751 $ORACLE_HOME/bin/oracle
Then the whole cluster worked well.