This article is about managing errors generally, but with a focus on the extra complexity of doing so with data.
There are essentially three strategies for managing circumstances when things go wrong. Of course, the first and best option is to recover. If your process fails because it cannot access data, then the best solution is to make the data accessible (for example, close the DVD drive so the disk becomes readable). The good news is that if you can recover, then you may not need to worry about any of the other error strategies; the bad news is that it is unlikely that you will be able to write code to recover from every potential problem. After all, your code needs to handle both “expected” failures — like a validation check that doesn’t pass — or “unexpected ones” — like becoming a deadlock victim.
One great difficulty is that most code consists of a sequence of calls to and from other procedures — some of which may be written by you, and some not. Consequently, your procedure may need to undo its own work, but it also needs to tell its “parent” procedure (the caller) to undo any prior work too. This is critically important, and it will certainly drive your error management architecture: you can’t just log the error and abort; you must notify the caller that you were not able to complete the process.
So, when an unrecoverable error happens, we are left with three critical tasks:
- Immediately abort further processing
- Inform the caller that the process failed and provide as much helpful information as you can.
- Get the system back to a consistent state, preferably one that allows the process to be retried. This is much more important in the case of data transactions.
Let’s start by inventing a system that allows us to accomplish these goals: we need the option to recover from known fixable problems, capture and report error information to the caller, and undo any work that has happened before the error.
Propagation and patterns
One common “code pattern” that I’ve encountered (generally from Visual Basic days) was to always have procedures return a true/false result, indicating success or failure. If the procedure returned true, the processing was successful; if it returned false, the processing was not. Of course, this approach did make it easy for the caller to know whether the callee had failed. However, there are some limitations to this approach. If this seems obvious, bear with me; itemizing the limitations of this approach is illustrative:
- Returning true/false (bool) does not indicate why something failed, making recovery impossible and diagnosis difficult. Returning a numeric value instead would allow the procedure to use 0 for success and a range of values to indicate the type of failure. By returning error numbers that correspond to known errors, the calling code can then test for specific numbers and possibly even attempt recovery.
- Useful error messages, like “You have been chosen as a deadlock victim” are lost, or at least not visible to the calling procedure.
- Diagnosis is difficult because you don’t know where the error occurred in a called procedure. In most well-written code, procedures call other procedures, but in this case this pattern makes it impossible for the parent to know what actual command was executing at the time of failure. (It might be possible to gather diagnostic information is all procedures are logged, but this presents other problems).
- The convention of returning true/false is not generally followed by coders outside of this team, so this “pattern” would have to be adapted when calling code that was not created with this convention. Furthermore, the general expectation that a function like, say, “GetInvoiceNumber” will return an invoice number is frustrated because the function really returns a Boolean and the invoice number has to be returned as an output argument. Experienced coders would find this, shall we say, non-intuitive.
- Every call must be tested for success and then repeatedly provide (roughly identical) error management code to handle failures.
Because we’re much smarter than the author of that strategy, we can clearly see that the solution is to have our called procedures return an object that contains multiple attributes, like error number, error message, the procedure that was the source of the failure, and even the failing line number. This doesn’t address resetting data, but it’s a good way to ensure that parent procedures know of failing calls.
Or is it?
It turns out that the .NET framework, and COM before that, works just like what I just described. It’s already built in. In fact, even years ago a COM call was always executed as a function that returned a HRESULT. Because Visual Basic 6 transparently converted its function results to output parameters (pointers), the convention used by this shop to return true/false results was ironically duplicating Visual Basic’s own use of HRESULT to do the same thing! When received an invalid HRESULT, that was how it received an “error”. How it handled the error depended upon error settings but by default it automatically stopped execution of the current procedure and passed the error to the caller. Pretty much what we’d want it to do. The .NET framework uses the same approach for error handling. When an error is encountered, execution in the current procedure is stopped and the procedure either jumps to an error handler (if one exists) or returns error information the calling procedure. The calling procedure that receives the error will stop further execution and raise the error, in turn, to its caller. This creates a series of errors from child to parent until there are no more parents. This is called “error propagation”. If the error is not handled at any point in this chain, the Windows process itself will be shut down.
In short, without specifying any error handling, COM (in its day), the .NET framework, and even SQL captures robust error information, returns it to the caller, automatically stops further execution, and allows the calling code to trap and explicitly handle the error. Compare this to our list of weaknesses in the other approach: in the .NET approach, the caller is automatically alerted to a failure, error information like the error message (and sometimes procedure name and line number) is collected, the pattern represents a near universal practice by other coders, and there is no need to test each line for success since control will automatically jump to an error handler or the caller. Without doing any explicit error handling, a SQL error on a remote server will automatically propagate all the way to the client that invoked the procedure through a .NET web service.
The only fly in this ointment is the fact that at some point, errors should be handled. If an error is not handled it will continue to propagate to callers until there are no callers left. If this happens, the application process will simply shut down — rudely. There definitely should be error handling in most event procedures.
Which brings us to a second common “pattern” that I’ve seen in naïve shops: those that regard code as incomplete without error handling. In most cases, a .NET procedure without a try…catch block renders precisely the error handling behavior you want. Those shops that require error handling in every procedure are almost certainly over-engineering.
So, within COM, the .NET framework, and now even SQL, you get largely correct error handling for free. In fact, without specifying any error handling at all, an error in a SQL command will be sent to the .NET middle tier, through the SOAP interface, and back to the calling client. Automatically.
With SQL particularly, however, the story is more complicated because of the need to undo data changes. This is not built into the error propagation mechanism.
SQL Error Handling
What complicates the error handling story in SQL is the need to ROLLBACK data to a consistent (i.e. retry-able) state. This is a critical concern when working with data.
Earlier, we said we wanted to invent a system that allows us to:
- Be able to recover from known, fixable problems
- Capture and report error information to the caller, and
- Undo any work that has happened in advance of the error.
Try…catch blocks allow us to trap and recover from errors (even if recovery is simply to abort processing), error propagation allows use to capture and report errors to the caller, and SQL’s transaction functionality manages the last concern.
Here's why things get complicated, or at least not automatic anymore: you have to call ROLLBACK TRANSACTION to clean up the data, but doing so means you can’t leverage the automatic error propagation. You need to CATCH errors in order to explicitly ROLLBACK a transaction.
Of course, the ROLLBACK of a transaction is the essential solution to the problem returning the system to a known and restartable state. However, putting the ROLLBACK command in the catch block (where is belongs) means that it is necessary re-THROW the error so that the caller knows of the failure.
In short, your procedure might look a bit like this:
CREATE PROCEDURE dbo.MySproc AS BEGIN --Do validation work that doesn't change data BEGIN TRY BEGIN TRANSACTION; --Do work that changes data COMMIT TRANSACTION; END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; THROW; END CATCH; END;
In practice, SQL TRY…CATCH blocks go hand-in-hand with transaction ROLLBACKs. In most cases, the built-in error propagation functional probably provides exactly the caller notification functionality you need (abort and report to caller). So the main reason to trap errors is to enable a reset of the data with a ROLLBACK.
Note that SQL allows you to call several BEGIN TRANSACTIONS, but these are not “nested”. In other words, you cannot roll back an inner transaction, then still commit or rollback the outer one. No matter how many times you call BEGIN TRANSACTION, as soon as you call ROLLBACK, the @@TRANSCOUNT variable is set to zero and everything will be reverted to the very first call to BEGIN TRANSACTION. In other words, if, in various procedures, you call BEGIN TRANSACTION three times, you will need to call COMMIT TRANSACTION three times for the data to be saved; however, a single call to ROLLBACK before the final COMMIT will undo everything beginning with the first BEGIN TRANSACTION. Consequently, if you try to “recover” from an error passed from a called procedure, be aware that a ROLLBACK in that called procedure may have already undone everything.
When dealing strictly with SELECT procedures (just reading data), because we haven’t changed anything so there’s no need to put anything back. This means that the built-in error propagation — which will abort processing and return error information to the caller — is almost certainly all we need. There’s no need to add transaction or TRY…CATCH blocks around code that only reads data.
THROW vs RAISERROR
TRY…CATCH was introduced with SQL Server 2005. Prior to that was the inelegant and depricated @@ERROR variable. THROW, however, was not introduced until SQL 2012. Prior to THROW, you had to use RAISERROR (yes, one “e”).
SQL Books online tries to explain the differences between THROW and RAISERROR, but doesn’t really capture it. The difference is the information the caller receives.
- THROW, called without arguments, simple re-throws the last error.
- RAISERROR requires arguments and will throw a new error.
To illustrate the difference, suppose I have dbo.ProcA which begins a transaction, then calls dbo.ProcB. When ProcB fails, the error is propagated to ProcA, which receives the error and moves to it’s CATCH block. Of course, the catch block in ProcA rolls back the transaction, and then needs to notify it’s caller by continuing to raise the error.
- If the error is raised using THROW, the error information will correctly indicate that the error was generated in ProcB.
- If the error is raised using RAISERROR, the error information will indicate that the error originated in ProcA (which is actually only where RAISERROR was called from).
For debugging a complex series for procedures, the absence of accurate procedure information can be really problematic.
In SQL, breaking complex processes into distinct stored procedures is actually a really good practice. If a single procedure contains lots of different query logic, the query plans that the engine creates, which is optimized for one dataset, may be terrible for another query. Instead, it is better to call various other procedures in your logic, each of these sub-procedures would have a more optimized query plan. If this complex processing through multiple procedures generates, say, a “duplicate value” error, it would be invaluable to know where it occurred. THROW provides this, but RAISERROR does not — unless you use the err.Handler pattern described below.
There are a couple of problems with THROW, the most important being that it is not available on SQL Server 2005 and SQL Server 2008. (The other problem is that it doesn’t return the correct line number information).
If your code needs to run on earlier SQL versions, then you need to use RAISERROR. The two problems with RAISERROR is that the error information cointains information about where RAISERROR was invoked, not where the original error occurred. The other error is that it requires a lot of SQL statements to set up and use.
The err.Handle stored procedure is designed to handle both problems. It encapsulates the all of the logic involved in capturing and throwing via RAISERROR; now it only takes one command. Further, what if we could detect that we are simply re-throwing an error? We could then try to preserve the data from the original error. That’s the other thing that procedure err.Handle does: if it is simply rethrowing an error, it preserves the error information from the original problem.
Here’s the err.Handle code:
CREATE SCHEMA err; GO
CREATE PROCEDURE err.Handler As BEGIN /* Recommended use: BEGIN TRY BEGIN TRANSACTION; ... COMMIT TRANSACTION END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; EXECUTE err.Handler; END CATCH; */ DECLARE @ErrorMessage nvarchar(2048), @ErrorSeverity int, @ErrorState int, @ErrorNumber int, @ErrorLine int, @ErrorProcedure nvarchar(126);
SET @ErrorNumber = ERROR_NUMBER(); SET @ErrorMessage = ERROR_MESSAGE(); SET @ErrorSeverity = ERROR_SEVERITY(); SET @ErrorState = ERROR_STATE(); SET @ErrorLine = ERROR_LINE(); SET @ErrorProcedure = ERROR_PROCEDURE();
IF @ErrorNumber <> 50000 --When msg_str is specified, RAISERROR raises an error message with an error number of 50000. BEGIN IF LEFT(@ErrorMessage, 16) = N'Database Error #' BEGIN RAISERROR (@ErrorMessage, @ErrorSeverity, @ErrorState); --Rethrow our own message END ELSE BEGIN RAISERROR (N'Database Error #%d raised in %s, Line %d: "%s".', @ErrorSeverity, @ErrorState, @ErrorNumber, @ErrorProcedure, @ErrorLine, @ErrorMessage) END END; ELSE BEGIN RAISERROR (N'Database Error #%d raised in %s, Line %d: "%s".', @ErrorSeverity, @ErrorState, @ErrorNumber, @ErrorProcedure, @ErrorLine, @ErrorMessage) END END;
The logic is not very complicated, but it allows us to use err.Handle just as we would use THROW. Your procedure might look a bit like this:
CREATE PROCEDURE dbo.MySproc AS BEGIN --Do validation work that doesn't change data BEGIN TRY BEGIN TRANSACTION; --Do work that changes data COMMIT TRANSACTION; END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; EXECUTE err.Handle; END CATCH; END;
You may have noticed that the procedure is exactly the same as the previous example using THROW, except in this example we called err.Handle. Like THROW, the error information reports the originating procedure. Interestingly, the line number information is accurate in err.Handle, but not in THROW.
Proper error handling is critical to the stability and (especially) the supportability of your application. Although proper error handling does not seem as well understood as you might expect, the good news is that doing nothing, and letting the system just do what it does, isn’t a terrible solution. In fact, getting out of the way is generally the best approach. Except: when it comes to resetting your data changes, or when the error propagates to the event procedures. In both those two cases, you must catch and handle the error.
To revert any database changes generally requires an explicit transaction rollback. This requires catching, reverting, and re-throwing the error. This is slightly more painful than doing nothing, but the solutions presented here keep it from being hard.