Introduction

This document arose from a customer need. The DBAs needed to do a code review prior to a production deployment, but the development team didn’t want to find out at the last minute (after testing and signoffs were complete) that they would have to make substantial code changes. Consequently, my team and I created this document so that our customers would understand our concerns and expectations.

Although I created the initial draft, great MCM/MVP minds like Nic Cain, Robert Davis, and Argenis Fernandez all contributed. The mistakes are mine. This document does assume that certain best-practices SQL policies are in place.

The Approach

This process works by explicitly capturing how a senior DBA would approach a code review (or, at least, how I do). First, he or she will determine what is the general SQL context (is it a SELECT?, UPDATE?, ALTER?, etc.), then look for things could be problematic in that context. An experienced DBA can find issues effectively because he/she has mental “flags”, which indicate further investigation. The approach here works the same way.

In other words, given a particular context (a SELECT, say), do I see a pattern of concern (a HINT, for example)? In some cases, further investigation is warranted and the description provide some helpful criteria. This entire list seems quite long, but when you focus on just one area, there is often just a handful of things to keep in mind.

Likewise, this document is organized the same way. The system uses a simple decision tree:

  • What section of code are you in? (i.e., CREATE, SELECT, etc.)

  • What are the “flags” that should cause you to look more closely at the SQL?

  • When you see a “flagged” item, what are the situations where this could be a problem?

For example, perhaps you would start by reviewing a “CREATE PROCEDURE” section, so there are a handful of “flags” specific to procedures that you should be looking for. If you don’t see any of these flags, you might move to the first statement. Since the first statement is a “SELECT”, you would check to see of any flags in the SELECT section appear in the query. If so, you then need to investigate the flagged practice and see if there could be a problem in this context.

The meaning of each flag type is as follows:

STOP: This finding must be fixed.

WARN: The reviewer needs to investigate and some judgement may be required.

FYI: A non-blocking concern should be raised to the development team.

Not only did this disarm our customers, who could not anticipate our concerns, we also found that this approach was excellent training for less experienced DBAs to do quality reviews.

 

Of course, there are occasionally great reasons to violate this guidance. And of course this document can’t hope to capture all of the bad ideas that developers can concoct, so this document can’t hope to be comprehensive.

1. All SQL Scripts

STOP: No USE <DATABASE>

We require a USE <DATABASE> statement at the beginning of every SQL script file. This removes any ambiguity about what database should be the target of this script.

Due to the nature of deployments based upon DACPAC generated SQL, these may use substitution variables (in SQLCMD mode) for the database name.



STOP: Multiple USE <DATABASE> statements in one script

SQL script will be executed as DBO in a specific database. Changing the context to another database will cause the script to fail. Instead, provide a separate script file for each database (each with a distinct USE <DATABASE> statement).

For scripts that need to be executed in more than a single database, SQLCMD mode can be used so that the variable is customizable by the DBA at run time. In such cases, the directions should clearly call out setting the variable and to what it should be set (e.g., “Set $DBName to ‘MachineData’ and run script. Then change $DBName to ‘MachineData_IR’ and run script again.”).



STOP: No rollback plan

Every deployment must have a rollback plan. Complicated deployments, where different parts may need different validation and rollback strategies, should probably be broken into separate deployments.



WARN: The rollback plan is RESTORE

All deployments should be performed with a minimum of customer impact. Ideally, deployments should be done without the database being unavailable (and for most small deployments this is entirely possible). Of course, sometime a deployment will still require a brief outage as schema changes are implemented.

The reason we require a rollback script is because deployments need time for testing. During this testing the system needs to accept new data. Using a RESTORE command would reverse the deployment, but also remove any transactions added during the deployment window. Consequently, a rollback strategy using a RESTORE requires a much longer deployment window, during which the system is completely unavailable. Since our goal is to minimize customer impact, we are not generally supportive of this strategy.

Consequently, the DBA team expects a deployment rollback to explicitly reverse the changes implemented in the deployment. If the deployment is so complex that it can’t be safely managed through rollback scripts, the deployment is probably too big.

That said, it is understood that for certain kinds of deployments impacting batch-oriented systems with a manageable user base, a RESTORE rollback plan may in fact be agreed upon as the most practical and cost-effective solution.



STOP: SQL Agent Job Creation Script

The DBA team will create and schedule any job tasks on their shared job scheduling environment. Deployment information should define the steps (including commands or packages) and the schedule. The DBA team will create this on your behalf. They will not execute a SQL script that does it automatically.

However, if you prefer to provide a working set of SQL scripts that describe the changes you want made, this can be a great way to remove any ambiguity from the instructions.

The DBA Team utilizes dedicated job servers that allow us to provide custom reporting of job executions and to offload the performance impact of jobs onto a different server. We recommend creating your jobs so that they can be executed from a remote machine without a dependency on being on the same machine.



STOP: SQL Replication CREATE or ALTER Script

Likewise, the DBA team will use their judgment and experience to configure replication. To this, they will need to know source, destination, and schedule information. If you happen to have created scripts that define the replication changes you want, this can be a great way to unambiguously describe the changes you want. The scripts themselves, however, will not be executed.



STOP: Deprecated commands or features

Deprecated commands or features, or those features slated to be removed in a future release without going through the deprecation process, should never be used in new code as they simply create time-bombs that will have to be fixed later.

When modifying legacy code, deprecated code should be removed if it is practical and cost-effective. If it is not practical and cost-effective, then a bug should be filed so that it can be addressed in a future release and not forgotten.

2. GRANT or LOGIN

STOP: CREATE LOGIN in script

Logins, as server-level objects, should not be included in scripts, which because they will be executed as DBO will fail. If logins need to be created, include this in the documentation. New Logins should only be created for domain groups or roles, never domain users. SQL Logins are not allowed in new projects, and should be refactored if possible and cost-effective when required by legacy code.



WARN: No permissions script

A separate permissions script is not required for generated code (e.g., scripts generated by comparison tools) as it could require a large amount of effort to separate the code and possibly lead to human error that causes the deployment to fail. Manually generated code should provide the permissions commands in a separate script to ease validation of permissions.



WARN: GRANT permissions other than EXECUTE to a Service Account

In new projects, service accounts should access data exclusively through stored procedures. Granting SELECT, INSERT, UPDATE, or DELETE to the service account enables dynamic SQL. Any amount of dynamic SQL makes it impossible to determine how schema elements are being used and also opens security problems with SQL injection.

One exception to this is a GRANT to a database user with a login mapped to a certificate. This configuration may be necessary to allow a single stored procedure to access data in a separate database.



WARN: WITH GRANT option

Users should not be allowed to grant permission to other principles.



STOP: GRANT system object, service principle, server permission, etc.

User permissions should utilize the principal of least privileged. Server level permissions should never be granted to users or service accounts.



WARN: Membership in db_datareader

This permission should not be granted to a service account. Certain security groups may need to be granted permission, but this is also discouraged. “Least privileged” access requires that explicit permissions should be granted to the group, and no more permissions than required.

Groups that expect “carte blanche” read access (like, say, developers) should instrument their applications or build reporting so that this access is no longer required.

Users with access to modify source code cannot have read-only access to PCI compliant production databases.

Users in non-technical roles (e.g., business users) should not be granted access directly to production. Their access should be managed through the application interface or via reporting resources.



STOP: Membership in db_datawriter

Data writes should be executed via stored procedures. An exception to this would be bulk loading of staging type tables, which require direct table access. However, this should always be granted via explicit permissions.

3. CREATE/ALTER PROCEDURE

STOP: CREATE PROCEDURE dbo.sp_...

Stored procedures whose names begin with “sp_” cause the SQL engine to initially look in the resourcedb then master databases for them. This creates extra and unnecessary work every time the procedure is invoked. Additionally, if the procedure exists in more than one location, it could be result in the wrong procedure being executed.



WARN: XML or nvarchar(max) argument

Ensure that this type of procedure argument is not being used to pass a simple list of values or value pairs into the procedure. Earlier versions of SQL Server required these types of workarounds. Table-valued parameters address this issue directly and is the correct approach for handling this situation. For deeply nested relationships, on the other hand, XML is a great and economical way to represent complex data.



FYI: Missing SET NOCOUNT ON

In nearly every case, enabling this will cause limited downside, slightly improved stored procedure performance, and reduce network traffic. The exception, of course, is when the client is actually expecting interim execution results. Even with this statement, you can still use @@ROWCOUNT in your code.



WARN: No TRY ... CATCH blocks

A TRY … CATCH block is not required for procedures that contain only simple SELECT statements as long as that procedure is never invoked by another procedure. In new projects, procedures that invoke other procedures or are invoked by another procedure should have the TRY … CATCH block specified below, as this will provide reliable debugging information. Code which writes or deletes data must explicitly manage the transaction via a TRY … CATCH error handling block.

When modifying existing legacy code, a TRY…CATCH structure will be added only when it is agreed to be important, practical, and cost-effective.

Our recommended pattern for implementing error handling is as follows:

CREATE PROCEDURE dbo.MyProcName As
BEGIN 
BEGIN TRY 
	--Do setup work here: 
	--Validate args, create temp tables, calculate values, etc. 
	BEGIN TRY 
		BEGIN TRANSACTION; 
		--Do DML work here: 
		--Write data, read, update, etc. 
		COMMIT TRANSACTION; 
	END TRY 
	BEGIN CATCH 
		If @@TRANCOUNT > 0 ROLLBACK TRANSACTION; 
		THROW; 
	END CATCH; 
END TRY 
BEGIN CATCH 
	--Other clean up (example):
	IF OBJECT_ID(N'tempdb..#MyTempTable') IS NOT NULL
	DROP TABLE #MyTempTable;
	THROW; 
END CATCH; 
END;

If you have questions about this, or if you are using an earlier version of SQL (prior to 2012), take a look at the SQL error handling page.



WARN: Use of GOTO

The use of GOTO statements within code generally is considered a poor practice. It also make the code much more difficult to follow and maintain. For example, it dramatically increases the complexity of managing transaction counts and setting commits or rollbacks. The Production DBAs’ interest in consistent transaction management makes this a concern to the team.

When modifying existing legacy code, GOTO statements will be eliminated only when it is agreed to be important, practical, and cost-effective.



WARN: Use of "IF @@ERROR ..."

In current versions of SQL, the correct way to implement error handling is through the use of TRY … CATCH blocks. Evaluating @@ERROR is a legacy approach. The problem with the legacy approach is twofold: unless this evaluation is repeated after every SQL statement, errors may occur that are not handled, and this approach requires implementing multiple redundant error handling blocks. The Production DBAs’ interest in consistent transaction management makes this a concern to the team.

When modifying existing legacy code, @@ERROR statements will be eliminated only when it is agreed to be important, practical and cost-effective.



FYI: Indentation should increase between BEGIN and END blocks

We strongly encourage additional indentation of SQL between BEGIN and END statements. The principal example of this is with TRY … CATCH blocks, IF, WHILE, etc. Nested blocks should be further indented. This may not be technically necessary, but it is so helpful to understanding and managing transactions that the DBA team essentially requires it.



WARN: DECLARE @<tablename> TABLE...

Unless the use of a table variable is required (i.e. data needs to persist during rollback, table is a procedure argument, etc.), we strongly prefer the use of temporary tables. The only time table variables are allowable (other than when required), is when the data set is guaranteed to contain less than, say, 2,000 records. Contrary to myth, there is no performance benefit to table variables and there can be substantial performance downside.

When a table variable could potentially hold more than a 2,000 records, the code should not be deployed.



WARN: TRUNCATE TABLE

For performance reasons, TRUNCATE TABLE may be a wise choice. However, use of this operation requires ALTER TABLE permissions. It may not be the case that the execution context will have this permission. Actually, using least privileged access, they should not have this permission. It is an open question (judgment call) whether the performance benefit of using a TRUNCATE statement outweighs the risk of allowing the caller to modify the table.

Additionally, certain features like replication will block this command.



WARN: EXEC, EXECUTE, sp_ExecuteSQL '...'

It is perfectly fine if these commands are used to invoke another stored procedure. However, if these commands are used to execute a string — and especially if any component of the string is a string argument — the developer should find an alternative. There is no point in mitigating against SQL injection by requiring stored procedures, then running dynamic SQL within the stored procedures.

Dynamic SQL is acceptable for one time scripts, such as data updates and schema modifications, that would be overly complex without dynamic SQL. In these cases, the script should provide a @Debug flag that the DBA team can set to output the generated code rather executing it so that they can first validate that code generated before executing it.



WARN: CREATE TABLE #<tablename> without DROP TABLE #<tablename>

A created table should have a corresponding drop statement. The creation and drop should be handled correctly through the error handling logic (i.e. a table should not be left intact simply because the code escaped through a CATCH block.

An excellent approach is to test for the existence of the table before the drop, so that an error doesn’t occur on the cleanup:

IF OBJECT_ID(N'tempdb..#MyTempTable') IS NOT NULL
	DROP TABLE #MyTempTable;

 



WARN: Object reference without schema

When a database object is invoked without a schema prepended to its name, the database engine must resolve this. This is extra work that, while not onerous, the database engine should not have to do if the developer simply follows good coding practices.

When the database contains multiple schemas, legacy code cannot be granted an exception to this rule.



FYI: IF ... BEGIN <query1>...END ELSE BEGIN <query2> ...END

This design pattern (different queries based on conditional logic), is prone to suffer from parameter sniffing problems. Conditional logic is not evaluated at compile time so the first time the procedure is executed; both sets of queries will be compiled using the parameters passed in.

IF @SomeTest = 1
BEGIN
	-- Perform some queries
END
ELSE
BEGIN
	-- Perform a very different query
END

If the parameters passed in are vastly different than what they would be for the other set of queries, then you could end up with a non-optimal plan for the other set. A better approach in this situation is to invoke a separate procedure in each case:

IF @SomeTest = 1
BEGIN
	EXECUTE dbo.OneProcedure
END
ELSE
BEGIN
	EXECUTE dbo.TheOtherProcedure
END

This allows each conditional procedure to be compiled only when it is actually executed.

You should also consider using the SQL err.Handler pattern, documented elsewhere. With one procedure calling the other, the error handler procedure will provide correct error information.

4. SELECT Queries

STOP: SELECT * FROM ...

A query is a type of “contract” with the data receiver. It is critical to explicitly manage this contract, otherwise clients might break unexpectedly and unnecessarily when the schema changes. Using SELECT * can prohibit the use of an index due to the cost of looking up the additional columns required. Additionally, it hinders index analysis as the query optimizer is not able to suggest alternative indexes if all columns are returns

Please do not simply replace SELECT * with the complete column list. For performance reasons, it is best to always select only the columns you require.



WARN: Inline SELECT in field list or filter

There are occasions where the most effective way to generate the necessary result is with an inline SELECT (subquery) in the field list. However, the SQL engine will execute this SELECT statement for each row in the main query. This is fine if the main query will contain only a handful of rows; if the main query could contain many rows, this will be disastrous.

You can often get better performance by moving the logic to a temporary table, which allows aggregation or other operations in a single pass.

Ensure that the WHERE or JOIN filters do not use an inline subquery, for the same reasons, only more so. In these cases, a CTE or a correlated EXISTS column is a better option. Do not use non-correlated queries in an EXISTS clause.



WARN: WITH (NOLOCK) or WITH (READUNCOMMITTED)

Hinting in general is not recommended; however, some organizations have even made this particular practice a required coding pattern! The best way to handle transactional consistency is by defining isolation levels. In other words, if you want to read dirty rows (which you really don’t), set the isolation level to READ UNCOMMITTED. If you are simply trying to avoid waiting for locks (which is probably what you are trying to do), set the isolation level to SNAPSHOT (after snapshot isolation has been enabled in the database) or enable the Read Committed Snapshot Isolation (RCSI) server option.



STOP: Index HINTs

Because data is dynamic, even the rare occasions where index hints are helpful in the current environment, they should still never be included. As the shape of the data changes, these hints become less helpful and eventually problematic. In the end, SQL always knows more about your data than you do. Any short-term benefit from index hints is lost by the negative impact over time.



WARN: HINTs in general

The circumstances in which a query hint is a good idea are very rare. Even rarer are the occasions where a legacy query hint doesn’t cause damage to SQL performance when the data profile, which led to the creation of the hint, changes. There is no chance that we know more about the data in a production SQL database than the SQL engine does.



WARN: sub SELECTs

Sub-selects are queries that use another query in the FROM clause. Keep in mind that the sub-query results will be stored in tempdb so that the outer query can operate on it. This will be a heap. If the sub-select returns a small-ish result set, or if the optimizer is able to convert this to an INNER JOIN, this will may be a problem. However, if the sub-select is complex and return a lot of rows, performance can be very poor.

In cases where this could be a problem, writing explicitly to a well-designed temp table may allow the engine to optimize the subsequent join to the outer query. There is no downside to doing this consistantly.



WARN: Function in JOIN or WHERE statement

In many cases, using a function in a field comparison will preclude the use of an index. This may be acceptable if query needs to compare a relatively small number of rows. Otherwise, performance may range from very poor to catastrophic.

In many cases, one can put the function on the correct side of the operator and get the same results with vastly improved performance. For example, the following query with a function on the datetime column causes a table scan in the NorthWind database, even though there is an index on the OrderDate column:

SELECT OrderID 
FROM NorthWind.dbo.Orders 
WHERE DATEADD(day, 15, OrderDate) = '07/23/2012';

However, by moving the function to the other side of the WHERE equation, an index can be used on the datetime column. This is shown in the following example:

SELECT OrderID
FROM NorthWind.dbo.Orders
WHERE OrderDate = DATEADD(day, -15, '07/23/2012');

The optimizer gets smarter about these things and in simple queries can make better decisions about built-in functions like ISNULL. In very simple queries, you may see the optimizer reformat the query internally on its own to work around the function. For very simple queries, a built-in function may be acceptable if a better way cannot be found. The optimizer is not able to work-around user-defined functions and these will always be red-flagged.



WARN: Functions in SELECT

Using functions in the SELECT clause can be a performance killer. Like using a function in the WHERE clause or JOIN criteria, this can cause the function to be invoked hundreds, thousands, or millions of times. A function in the SELECT list will execute the function for every row returned by the query.

Ensure that the function is optimized. It is particularly dangerous for such functions to do database lookups themselves.



FYI: BETWEEN operator

While the BETWEEN operator can improve code readability, it also creates ambiguity. The actual evaluation of BETWEEN is inclusive of the two specified values. This is often not the desired behavior, which is more typically “greater than or equal to” the first value and “less than” then second. When BETWEEN is used, it’s not clear whether the developer intends the behavior he/she is specifying. We prefer the use of explicit “>”, “>=”, “<”, or “<=” operators.



WARN: No covering indexes

Queries should typically have covering indexes. There are lots of reasonable exceptions to this. However, the best time to ensure that indexing is appropriate is when a SELECT (or UPDATE) statement is being created or changed.



WARN: Data type missmatch to constant value

Ensure that constants are declared with the correct data type that matches how they will be used or the columns that they will be compared against. For example, use Unicode (i.e. “N'value') when evaluating a constant against a Unicode columns (and ANSI when used to evaluate ANSI). Otherwise, the engine may need to perform a data type conversion to perform the evaluation. Failure to do so could result in data being silently truncated, converted to a different character than what was entered, or can hinder index usage.

In simple queries, the SQL engine has gotten smarter about converting the constant value instead of the column, but in complex queries, the optimizer may still choose to convert the column before comparing data which could force an index scan instead of a seek or may even prevent use of an index.



WARN: Nested Views, Subqueries or CTEs

Nesting views or subqueries adds complexity to a query making it difficult for the query optimizer to develop a good plan. There is almost always a way to write a query without nesting views or subqueries.

Likewise, using several CTEs within the same query can cause major performance problems. In many scenarios, the optimizer treats CTEs like nested views inside of table variables. This can often lead to the query optimizer choosing a very complex plan that performs poorly. Simplify by breaking out intermediate levels into temp tables.



STOP: @@IDENTITY or Identity_Scope()

The output clause is a much safer method for returning an identity value than @@Identity or scope_identity(). Scope_identity() is safe in most cases, but there are some known edge cases where it can return the wrong value. It is common for @@Identity to return an incorrect value.



WARN: Linked Servers

Joins or queries across a linked server are discouraged. Linked servers perform very poorly when the data set is not extremely small and there is little that can be done to optimize them. If there is no other way to include the data set, ensure that the amount of data transferred is minimized. For example, the linked server data should be queried into a temporary table rather than JOINed to directly.



WARN: DECLARE CURSOR

Cursors are deadly for performance. The only justification for their use is that sometimes there is no other way to accomplish what you need to do. Inexperienced SQL authors will adopt cursors because they are similar to how they manage data in their other languages; these instances cannot be approved.

For cases where a CURSOR really seems to be required, sometimes a CLR aggregation function can be an alternative; it can do the same calculations much more efficiently.

In every case, avoid cursors unless they are unavoidable.

5. Database Design

WARN: Compound or large PRIMARY KEY

The contents of the Primary Key fields must be fully included on every row in a nonclustered index; therefore, a large primary key will cause every index to balloon. This affects how much of the table can fit in memory and how efficiently SQL can search for data.

If a table requires a combination of several columns or large columns to create a unique Primary Key, consider using a surrogate key as the Primary Key and simply putting a unique constraint across the previous candidates.



WARN: UNIQUEIDETNFIER (Guid) data type in clustering key

The point of a Guid data type is that it is theoretically globally unique (assuming the originating machine has a unique MAC address). There are several critical advantages to this property; for example, a Transaction ID can be generated on a remote kiosk without a concern that the same Transaction ID might be generated elsewhere. UNIQUEIDENTIFIERs can also be useful in avoiding replication conflicts.

However, this random nature makes them likely to create large amounts of fragmentation when used as a unique clustering key. By default, a PRIMARY KEY is also a clustering key. Given the reasons people select GUIDs, the chance of such a table remaining small is very small. The resulting fragmentation of this busy table will make the designers regret their choice.



STOP: Constraints without names

When creating a table, the SQL engine allows you to create constraints (Primary keys, defaults, foreign keys, constraints, etc.) without providing a name for them. Because the resulting name is essentially random, they will have different names in test, dev, and production environments. When the later need arises to drop or change the object, this can be very difficult to manage or even test.

All PRIMARY KEY, DEFAULT, CHECK, and FOREIGN KEY constraints should have names.



STOP: NTEXT, TEXT, or IMAGE datatypes

TEXT, NTEXT, and IMAGE data types were deprecated in SQL 2005. They have been replaced with varchar(max), nvarchar(max) and varbinary(max), respectively.



STOP: SMALLDATETIME datatype

The SMALLDATETIME data type cannot support dates beyond the year 2079. The code may not last another 60 to 70 years, but there’s no reason to build in a time bomb.



WARN: XML data type (untyped)

SQL has two ways of storing XML, typed and untyped (i.e. without a corresponding schema). SQL handles typed XML efficiently, but there is a considerable skillset change required to query and access this data. If you’re going to go to the trouble of defining an XML schema collection, the overhead of just creating the corresponding columns may be less and the manageability of the data will likely be more.

SQL stores untyped XML as double-byte Unicode strings. This is often even less efficient than XML in the wild (which is typically UTF-8). SQL has no optimizations on untyped XML. What makes this data type dangerous is that clients may imagine that they can work with this XML type. These XML queries become extremely inefficient.

The recommended practice is to store untyped XML in the database as a UTF-8 blob stored in a varbinary(max) data type. With this approach, data conversions are lessened, transfers are smaller, and no one is confused about whether they should be accessing the XML data from within a stored procedure.



WARN: DATETIME data type

Due to legacy practices, the DATETIME data type is often used for dates when only the date portion (not the time) is used. This is much less efficient than the DATE data type, which requires only 3 bytes (vs 8 bytes). More importantly, when a time component is (unnecessarily) included, all queries must use a datetime range to capture the values of a single day, rather than a simple equality. Switching to DATE is more than just a simpler programming and reporting experience, it also improves the accuracy and reliability of the data when not every data query knows how to correctly handle DATETIMEs.



WARN: DATETIME2 date type

When a decimal precision is not specified, this data type defaults to datetime2(7), the maximum size (8 bytes). Explicitly reducing this to datetime2(2), will save two bytes per record and still preserve enough millisecond resolution for nearly every purpose.



WARN: GEOMETRY or GEOGRAPHY data type

These should have a corresponding spatial index. Spatial data without a corresponding index is inviting (performance) disaster.



WARN: CHAR or VARCHAR data type

As a general practice, SQL data types should correspond the data types of their source/originating system. So, for example, data coming from an ANSI text file source should probably have a varchar (ANSI) data type. More importantly, data that will be joined against existing ANSI data should probably also be ANSI.

However, today most source systems are Unicode — including .NET Windows and Web applications. Consequently, we expect that most data tables should also use Unicode data types. Systems that allow Unicode inputs (like ASP.NET or WinForm applications), but store these as ANSI data types (char or varchar) must to have code to explicitly handle any conversion issues. It is not acceptable for the UI to accept a Unicode character, but for no one to know what will happen when this value is saved and retrieved from the database.



STOP: “Reserved” colums

Fields that are “reserved” for future use waste space without offering any benefit. It is very easy to add additional table columns at the point they are needed (which also can be named at that time so as to identify their purpose). Columns “reserved” for future use are solving a problem that doesn’t exist.



WARN: No PRIMARY KEY constraint

The absence of a primary key raises two concerns: the SQL engine must “uniqueify” the table rows and indexes become less efficient. Creating a table without a primary key is acceptable only when the data sets are small (less than, say, 20 records), the system does not need to access particular rows, and the order doesn’t matter.



WARN: No clustering key

Unless you specify otherwise, primary keys are also clustered. In cases where the primary key is explicitly non-clustered, one should select an alternative clustering index. This make sense when records must be routinely accessed or returned in an order that does not correspond to the primary key. Schemas where tables do not have any clustering key will need to justify this design choice.



WARN: FOREIGN KEY constraint without index

A field with a foreign key constraint should nearly always be indexed. Changes to the referenced columns are checked against the referencing table, which can be expensive if the referencing values are not indexed. More importantly, columns with foreign key relationships are often joined, and without an index, this will likely be quite slow and needlessly inefficient.

The fields referenced by a foreign key constraint should not be nullable.



FYI: FOREIGN KEY (or related) fields with mismatched names

Over the course of a development project, the names of data elements can evolve. For maintenance and manageability, it is critical to remove ambiguity by ensuring that matching data sets use matching names. For example, does a “CustId” correspond to a “CustomerId”? Someone initially approaching this data design should not have to wonder.



FYI: Column names that don’t correspond to purpose

One of the most important components of maintainability is clarity around the design. The small costs involved in making sure that names are accurately descriptive is more than recouped in the maintenance costs later. Column names should always accurately reflect the intended value set.



FYI: Misspelled column or table name

Misspelled table names or column names will cause aggravation and reduced productivity for all those accessing the table until the problem is fixed. Since data is to be used, this is a problem we should prevent.



WARN: Very small data in variable length data types

When physically writing a record to a page in the database, extra bites of information are required to track the offsets of variable length data. For very small bits of data, the overhead of tracking the offsets may require more space than the data would take up as fixed length.

You should never use variable length data types with a length of 1. It is rare that you should even use variable length data types with a length of 2.

For example, a varchar(1) column will require 1byte to store the character and 1 byte to track the offset whereas a char(1) column doesn’t require tracking the offset and only uses 1 byte.

Furthermore, scanning through a variable-length column set has much more overhead than simply moving a fixed length column records.



STOP: Variable-length data stored in fixed-length data types

If the data for a column is likely to vary in length a great deal, use a variable length data type. The extra padding that is used to make the physical record fixed length can greatly increase the storage required.



STOP: Storing data in very large data types as a “catch all”

Using very large data types, particularly MAX data types, instead of well-defined data lengths can cause unexpected results with the way data is stored internally and can make it easier for bad data to make it into the database. MAX data types are handled by the database engine differently in the execution stack and are less optimal than well-defined data lengths.



STOP: Denormalized data

When designing data structures, the design should be normalized to at least 3rd Normal Form (3NF). Generally, it is not needed to go beyond 3NF for a database design as well.

Do not store data repetitively in the database. For example, State name should be broken out into a lookup table that contains 52 entries rather than spelling out the name or abbreviation for all 10,000,000+ customers.

Do not store data in repetitive columns. For example, don’t use multiple columns for something like Phone1, Phone 2, etc. Instead break phone number into a related table with a single phone number column that can have multiple rows for each parent row.

Do not store data that is not relevant to the record in the same table. For example, don’t store a customer’s account balance in the customer table as the balance is a property of their account, not of the customer.



STOP: Clustered or non-sequential GUIDs

Uniqueidentifier (GUID) is a much larger data type than preferable for a clustering key, primary key or otherwise. In addition, non-sequential GUIDs cause very high levels of fragmentation. If a GUID is needed for a primary key, it should be nonclustered and a surrogate column, like an identity column, should be used as the clustering key instead.

If a GUID is used as a nonclustered primary key, and it is being populated by SQL Server, it should be sequential. Use NewSequentialID() instead of NewID(). Nonsequential GUIDs are not allowed as clustering keys for any reason.



WARN: CREATE INDEX

Maintaining indexes has an overhead. Too many indexes can greatly slow down insert, updates, and deletes on tables. Adding a large number of indexes can have a negative impact on performance if the overhead outweighs the benefit. Indexes should only be added after properly testing the cost and benefit of the index. A table with a lot of indexes may be a sign that indexes are being added with proper investigation. New indexes will not be added to a table that already has a large number of indexes without showing that proper investigation has been performed including investigating the usefulness of the existing indexes.

Duplicate indexes are not limited to only indexes that are identical. It includes indexes that are similar and could be used to serve the same queries. Often, indexes get added with slightly different included columns. Often simply adding an included column to an existing index with a much lower overhead and still provide the same benefit. Indexes that can be merged with an existing one will not be added as a new index.

6. Triggers

WARN: CREATE TRIGGER

All triggers require careful evaluation. If there is any alternative, it should always be considered.

Triggers must not perform extensive or long-running operations. Nor should they be used on large or frequently-modified tables. Triggers that update other tables should generally be avoided. Ideally, triggers should only operate on the table that generated the trigger. A trigger that updates other tables can cause unexpected performance issues.

Triggers should not be used on critical tables to perform non-critical updates as a failure to the non-critical part will rollback the entire transaction including the critical part. For example, a table that holds customer or financial information should not be allowed to fail because we want to update a non-critical column such as a ModifiedDate column.

To decouple critical transactional activity from less critical subsequent work, consider using a Service Broker queue.

7. INSERT

WARN: INSERT ... FROM @<table> without ORDER BY

In most cases, the optimizer will automatically sort the SELECT records in clustering key order for an optimized INSERT. Occasionally, the optimizer may also decide that the potential fragmentation/page-split impact is less than the sorting overhead. In the case of table variables, which do not have statistics to help the optimizer, this can cause the optimizer to guess very wrongly about the impact of not sorting — and the result can be massive fragmentation. Therefore, it’s best to explicitly specify ordering by the clustering key, especially when using table variables.

8. DML (during deployment)

WARN: No OUTPUT clause

If DML code used in implementing a deployment code does not use an output clause, it is possible that they are not capturing values needed for an online rollback. Whenever possible, unless it can be demonstrated that no data can be added or altered in the meantime, use an output clause to persist data affected by a deployment.



STOP: Values not persisted until rollback

Depending on the circumstances, appropriate method for persisting data needed for a rollback may range from a temporary table to a .csv file.