As I mentioned in Oracle Tablespace Monitoring, I worked for a company that operated a very large internet site and our space consumption was at the rate of 800GB every month. We were adding 4x400GB LUNs to our RAC cluster every two months. And this amount of space needed to be added not only to the primary production site, but also to the snapshot DB, reporting DB and the physical standby DB in a remote datacenter. Needless to say, I’ve seen a number of multipath/storage and ASM issues and to make my life easier I developed an Oracle ASM Diagnostics Script that allowed me to perform some basic health-checks on the state of ASM GROUPs and DISKs as seen from the database side.
The script is called lasmdsk.sh and it does the following:
Check if ASM is running and parse the ASM instance ID (ASM1,2,3,n etc):
Parse the ASM ORACLE_HOME from /etc/oratab
using the ASM Instance ID from previous step and set up ORACLE_SID and ORACLE_HOME environmental variables accordingly:
Run asmcmd
command and print the following attributes:
- State
- Type
- Rebal
- Sector
- Block
- AU
- Total_MB
- Free_MB
- Req_mir_free_MB
- Usable_file_MB
- Offline_disks
- Voting_files
- Name
And finally dig deep into the ASM data dictionary joining gv$asm_disk
and gv$asm_diskgroup
to check for the most common issues we’ve seen in our shop while adding SAN provisioned multipath’ed LUNs to our databases:
However the true utility of this script is in how quickly and easily it allows me to filter the output of the above query. And to really demonstrate this – let me give you a real example of how I add new LUNs under ASM and provision them to a real production data group:
Lets say I just created four new ASM DISKs (DATA_105, DATA_106, DATA_107 and DATA_108) using sudo /etc/init.d/oracleasm createdisk
command and then I didscandisks
and listdisks
on all RAC nodes.
Now it’s time to verify the gv$asm_disk.HEADER_STATUS
= PROVISIONED
. I could setup ORACLE_HOME and SID and PATH variables to point to ASM/Grid OH and then login to sqlplus
and run the query selecting HEADER_STATUS
from gv$asm_disk
where NAME ...
hmm … do you see the problem?
I now have to use an IN
or OR
operator to get all 4 DISKs because there is no common pattern to give to a LIKE
operator unless I use regex and who is going to do that on a fly?
Contrast this with my script instead:
wget https://s3.amazonaws.com/mve-shared/bin/lasmdsk.sh
./lasmdsk.sh “_105|_106|_107|_108”
Easy! And it works because inside the script I wrap the query in a shell function and then pipe it’s output to egrep
which does the filtering faster and easier than it’s possible inside oracle:
get_asm | egrep "INST_ID|^--|${ASMDISK}"
Now I simply run ./lasmdsk.sh “_105|_106|_107|_108”
, check that HEADER_STATUS
= PROVISIONED
and move to the next step which is creating a TEST disk group and adding the 4 new DISKs to it to make sure everything works as expected:
ASMID=`ps -ef | grep pmon | grep ASM | awk '{print $NF}' | sed 's/asm_pmon_//g'`
ORACLE_HOME=`grep ${ASMID} /etc/oratab | awk -F: '{print $2}'`
export ORACLE_HOME
PATH=$PATH:$ORACLE_HOME/bin
export PATH
ORACLE_SID=${ASMID}; export ORACLE_SID
sqlplus / as sysasm <<EOF
create diskgroup TESTGRP EXTERNAL REDUNDANCY
disk
'ORCL:DATA_105'
, 'ORCL:DATA_106'
, 'ORCL:DATA_107'
, 'ORCL:DATA_108';
exit;
EOF
Next I mount the new TESTGRP group on the remaining nodes of the RAC cluster using another script of mine called mntgrp.sh which I previously installed on all nodes in the RAC cluster using the following method:
for x in {2..6}
do
scp mntgrp.sh racdb0${x}:/oracle/dba/bin/
done
which now allows me to call it with mount|unmount
option like so:
for x in {2..6}
do
ssh racdb0${x} /oracle/dba/bin/mntgrp.sh TESTGRP mount
done
I then verify that the TESTGRP is mounted on all nodes in the cluster using the following:
./lasmdsk.sh TESTGRP
Next I proceed to check ASM Alert logs for any errors (actually I simply check our custom monitoring system that keeps an eye on these things) and if none found I drop the TESTGRP and assign the 4 new DISKs to the real production DISK GROUP that they were destined for. Here’s the remainder of the process:
Dismount the TESTGRP on all but first node (node where I am now):
for x in {2..6}
do
ssh racdb0${x} /oracle/dba/bin/mntgrp.sh TESTGRP dismount
done
Take a storage level snapshot of the database (this is a scripted process so I will spare you from the details of this as it’s very specific to our environment and storage).
Once the snapshot is taken – drop the TESTGRP:
## first make sure it’s dismounted on all but this node
./lasmdsk.sh TESTGRP
## then sign into sqlplus as sysasm and issue:
drop diskgroup TESTGRP;
Verify that the HEADER_STATUS
= FORMER
using lasmdsk.sh
:
./lasmdsk.sh “_105|_106|_107|_108”
and finally add the 4 DISKs to the real production data group as sysasm:
ALTER DISKGROUP PROD_DATA1 ADD
DISK
'ORCL:DATA_105'
, 'ORCL:DATA_106'
, 'ORCL:DATA_107'
, 'ORCL:DATA_108';
alter diskgroup PROD_DATA1 rebalance power 10;
At this point I am almost done. I just do one more final check on the state of DISKs and GROUPs using lasmdsk.sh
:
./lasmdsk.sh “_105|_106|_107|_108”
I check above output for some key things that we watch out for:
- gv$asm_disk.PATH doesn’t match gv$asm_disk.NAME (something terribly wrong happened at OS level, rescan or bounce to clear it up)
- DISK belongs to a wrong GROUP (ditto)
- GROUP’s
state
is not what it supposed to be (mount/dismount manually) - DISK’s state is
HUNG
(should be NORMAL: SolutionALTER DISKGROUP name UNDROP DISKS
) - DISK’s
header_status
is other thanMEMBER
(PROVISIONED
is OK for newly formatted unused/free disks)
For example – here’s an issue with CRS DISKs pointing to wrong paths (click the image to view full size):
I hope this helps you with chasing down ASM/SAN-storage issues! And if you have any questions head over to our Oracle DBA Community and simply ask.
Vitaliy Mogilevskiy December 22, 2015
Posted In: Operations, Scripts
Tags: ASM, gv$asm_disk, gv$asm_diskgroup, LUN, ORACLE_SID, oratab, SAN