Help, my DB is feeling ill: Troubleshooting sequence…
Hello, my friends!
Many times I’d been asked for a reasons, due to which DB could stuck or even be aborted, so I decided to make a little overview. When a DB hangs, stops responding or even doesn’t startup, then a typical behaviour for a novice — running in circle and screaming.
First thing you must to do — just relax, all bad things, that could happen, already have happen. Take a breath, make a cup of coffee and relax: it’s time for diagnostics. We must find out a problem and allow users to work as fast as possible, all analysis we’ll make afterwards.
My troubleshooting sequence is:
- Hardware. In most cases this work is for system administrators: if server is down, DBA can’t even connect to shell. But nevertheless from time to time this part is assigned to DBA:
- Check for error messages at hardware monitoring screens, if there is any;
- Check OS logs.
- OS. The same — this also belongs to SA. But OS environment parameters must be set by DBA:
- Check if Oracle parameters ($ORACLE_HOME,$ORACLE_SID,NLS_LANG etc) are set correctly;
- Take a look at OS logs for any related errors;
- Check, if OS has enough resources for running and maintaining Oracle DB. Poor resources can make new connections impossible.
- Find out, if it is started and in which mode;
- Take a look at alert log and search for ORA errors that prevent instance to start up or open;
- Performance issue, that can stuck all activity (this topic I’ll describe in my next article);
- Don’t forget about resources: Memory parameters, Flash Recovery Area free space (a common error of young DBA).
- Database. If instance can’t open DB, then there could be a quite badly situation with DB files:
- Check alert log for ORA errors;
- Check for control file loss;
- Check for DB files corruptions or loss of log file groups.
- Check, if NIC is up and configured;
- Check, if server can reach its default router and vice versa;
- Check, if server can be reached by client PC.
- Try to tnsping server and analyze an error if any;
- Check, if Listener is up and configured;
- Check for a list of registered databases;
- Check, if naming method at client’s PC is configured correctly;
- Take a look at Oracle.Net log files (listener.ora, sqlnet.log).
As you can see, the most frequently mentioned file is alert log. So if server is up and OS is running, alert log is the first place to look at.