SQL Server 2014 Cardinality Estimator


Two-part:  What is different in the new Cardinality Estimator, and how can I check to see which one I’m using?


The cardinality estimator (“CE”) was changed in SQL Server 2014.  New assumptions and logic were added based upon a modern workload (OLTP and Data Warehousing).  These new additions have shown, according to Microsoft, that most customers experience query performance enhancement, or at the least they remain the same; while a small amount of customers have seen regressions.

Microsoft doesn’t document much about the query optimizer, but their online documentation does explain some of what’s going on.  Their examples are shown in a “new” and “old” format of what happens now vs. what happened with the old estimator.

Microsoft has provided a couple ways to change the database configuration to enable the new cardinality estimator.  Trace Flags: 9481 and 2312 have been provided to disable/enable the new (2014) cardinality estimator.  You can apply these flags to an individual query using the OPTION (QUERYTRACEON  9481) statement.

SELECT * FROM Sales.SalesOrderDetail SOD
WHERE SOD.SalesOrderID = 3124
OPTION (QUERYTRACEON 9481); — use legacy CE

Alternatively, to set the entire database to use the new CE, you use the ALTER DATABASE statement.


You could also used DBCC TRACEON with a -1 option to make it global, or use the –T option in the startup configuration.  This allows you to use the features of SQL Server 2014, while keeping the legacy CE.

So getting back to what changes have actually been made, Microsoft examples show us the following:

  1. Ascending Key Issue: where statistics may not contain a specific value since they were last updated.  For instance, data being inserted on a daily basis, but the statisitics may not have been updated to reflect the latest data.  A Where predicate based upon a date for instance where the predicate contains a date later than what is reflected in the current statistics.
  2. Correlation between filtered predicates on the same table: in a query where you “AND” columns in the WHERE predicate that contain a correlation – such as City and ZipCode, the old CE would not draw a correlation between the two.  The new CE makes an assumption that there is a correlation between the two columns and uses a different algorithm to estimate the data distribution.
  3. Filtered predicates on different tables: rather than making the assumption that filtered columns from different tables “AND’ed” together are correlated, the new CE assumes they are not correlated.  This is based on the modern workloads tested.

Prior to migrating older databases (pre-2014) to the new CE in a production system, Microsoft suggests testing the database against the new CE.  This can be done in a testing environment using the two methods I mentioned earlier in this blog – Alter Database and Trace Flags.  This allows you to identify any potential regressions on a test box, and fix them prior to a production migration.  Any databases migrated from an older version to a 2014 instance will retain the legacy CE by default.

To view the compatibility level of databases on an instance, use the sys.databases DMV, and check the compatibility_level column:

SELECT DB.compatibility_level AS [Compatibility Level]
             , DB.name AS [Database]
FROM sys.databases DB;

To validate the CE being used within a query, you have a couple options.  One is to include the Actual Query Execution plan.  Consider the query:

FROM SALES.SalesOrderDetail SOD;

SSMS QEP capture 1

Right-click the “Select” operator and choose “Properties”:

Select Properties 1

Under “CardinalityEstimationModeVersion”,  you’ll see either 70, for the old CE, or 120 for the new CE.

*NOTE:  70 does not mean that the database is using the CE from SQL Server 7.0, 70 is used for all the legacy versions.  70 = old, 120 = new CE.

The other option is to view the XML from the Query Execution Plan:

SSMS QEP capture 1



Select Properties XML

Right-click in the result window and select “Show Execution Plan XML

Once you have the XML screen, find the tag:  StmtSimple.

You should see:

<StmtSimple … CardinalityEstimationModelVersion=“70”

To turn on / off the new CE on an individual query, you can use the QUERYTRACEON option.  Here is an example of turning “off” the new CE:

FROM HR.Employees
OPTION (QUERYTRACEON 9481); — 9481 old CE

Here is an interesting scenario using the DBCC TRACEON command to set the CE to the old CE:

DBCC TRACEON (9481, -1) — old CE, make it global

Indeed, if I perform a query of any of my user databases on the instance, I’ll see that the legacy CE is
being used. However, if I use the QUERYTRACEON option show above, with the 2312 value, that query
execution plan will overwrite the global setting and use the new (120) CE. Here is the code:

DBCC TRACEON (9481, -1) — set to old CE globally
USE TSQLFundamentals2008
FROM HR.Employees
OPTION (QUERYTRACEON 2312); — option to use new CE for query

Oh, and while all this is happening, the Compatibility Level stays the same. Here is the query to prove it:

SELECT DB.compatibility_level AS [Compatibility Level]
            , DB.name AS [Database]
FROM SYS.databases DB
WHERE DB.name IN (‘AdventureWorks2014’, ‘TSQLFundamentals2008’)
ORDER BY [Compatibility Level] DESC;

Compatibility Level query output


*NOTE:  AdventureWorks2014 is showing a 110 Level because I changed it manually earlier.

So, TSQLFundementals database will use the legacy CE, but keep the Compatibility Level of 120 (2014), unless the querytraceon option is used.

Isn’t science fun!


In this blog I’ve shown you how to enable and disable the new cardinality estimator.  I’ve discussed how to set this at the instance level, and at the query level using querytraceon.  Finally, I’ve shown how to validate the version of the cardinality estimator being used by including the QEP and looking at the properties, or the XML version of the plan.

There is a lot more to discuss about the new cardinality estimator:  how it acts with filtered predicates; associations derived between related columns; and how joins are computed using the new estimator.  I’ll post another blog with some examples of these.

You can review a white paper written by Joseph Sack that goes into detail on the items I’ve mentioned above.  Check out the references for the link.

Knowledge is Power!


Microsoft SQL Server – Optimize your Query Plans with the SQL Server 2014 Cardinality Estimator,  written by Joseph Sack:



MSDN blog on the new Cardinality Estimator (Two parts):



SQL Server: Parameter Sniffing


What is parameter sniffing, and is it a bad thing?


Parameter sniffing is actually a very good thing, most of the time.  When it does go bad, it’s usually extremely bad.  I’ll define, demonstrate, and provide some ways to avoid the bad examples of parameter sniffing.

So what is parameter sniffing?  The optimizer uses statistics histograms to estimate the cardinality of a given query.  One of the things the optimizer does is evaluate the values of the parameters being passed in.  This is “parameter sniffing.”

When the query or stored procedure containing a parameter is executed, the value initially supplied for the parameter is used when creating the execution plan.  Subsequent calls to this query/SP will use this execution plan stored in cache.  So, if the parameter value is one with high selectivity and represents the data as a whole, this is good.  However, if a value is passed that is not representative of the data, the query may suffer in performance.

The best way to understand this is by creating an example.   Using the AdventureWorks20XX database, a good example table is the Sales.SalesOrderDetail.  I’ll perform a query to list the ProductID’s distribution, create a stored procedure with one input parameter, and then execute the procedure to demonstrate good, and bad, parameter sniffing.  Finally, I’ll discuss some of the most popular ways to avoid bad parameter sniffing scenarios.

Query the Sales.SalesOrderDetail table to get a count distribution for the ProductID values:

            , COUNT(*) as [Count]
FROM   SALES.SalesOrderDetail SOD

This returns:

ProductID           Count

870                       4688
712                       3382
873                       3354
many more rows
943                       6
942                       5
897                       2

So, from this query, we can see that ProductID 897 has only two entries, while ProductID 870 has 4,688 entries!  That’s quite a difference, let’s see if it matters.

Let’s create a stored procedure to query the Sales.SalesOrderDetail table on ProductID:

FROM [Sales].[SalesOrderDetail] SOD
WHERE SOD.ProductID = @prod_id;

Now, let’s execute it and pass the parameter value of 897:

EXEC DBO.TEST @prod_id = 897;  — 2 records

key-lookup plan

The execution plan reveals a key lookup.  For this query, the plan is good.  Remember though, key lookups require a lot of IO, so this may not be the best plan in all cases.

Looking at the Select operator properties, we see that the parameter – @prod_id was set to 897 when compiled as we would expect.

Select operator properties

Now, let’s take a look at an execution with a different parameter value – 870, and for grins, lets also turn Statistics IO on:

EXEC DBO.TEST @prod_id = 870;  — 4688 records

Whoa!  While the optimizer used the existing execution plan in cache (key lookup), take a look at the logical reads it performed – 14,300!

(4688 row(s) affected)
Table ‘SalesOrderDetail’. Scan count 1, logical reads 14380, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The Select operator properties shows us that the compiled value for the plan generation was 897, but the runtime value was, of course, 870.

Select operator properties - high logical reads

This would be an example of parameter sniffing gone bad.  Executing the stored procedure with a value that has a high selectivity (2 records in our case) will cause the optimizer to choose an execution plan that may not perform well when other parameter values are entered.

In this example the compiled value of 897 produced the key lookup plan.  When the parameter value of 870 was entered, the optimizer said, “Oh, here’s an existing plan that uses a key lookup, cool, I’ll use it!”  And boom!  Instead of doing a key lookup of 2 records, we are now doing a key lookup for 4688 records!  The Statistics IO values returned shows the high IO due to the use of the same plan.

So, before we discuss what can be done to avoid this type of issue, let’s ask one more question:  What if I recompile the plan?  Okay, let’s do that and see what happens.

To cause a manual recompile of the stored procedure, we alter the stored procedure:

ALTER PROCEDURE [dbo].[TEST] (@prod_id int)
FROM [Sales].[SalesOrderDetail] SOD
WHERE SOD.ProductID = @prod_id

Okay, so now when dbo.TEST is executed, it will recompile and use the newly passed parameter value to generate an execution plan.  Let’s execute it and check this functionality.

EXEC DBO.TEST @PROD_ID = 870  — 4688 records

Clustered Index Scan - option recompile

Now we see the newly generated execution plan based upon the parameter value of 897.  This plan uses a Clustered Index Scan operation.  Let’s take another look at the stored procedure execution using Statistics IO:

(4688 row(s) affected)

Table ‘SalesOrderDetail’. Scan count 1, logical reads 1242, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

As you might expect, the logical reads has dropped dramatically, lowing the IO thus improving performance for this parameter value.

The downfall of this type of operation, where we recompile each time the stored procedure is called, is the fact that recompiles have overhead, CPU cycles, etc.  So is this the only way to avoid the case of bad parameter sniffing?  Nope.


Options are available to assist with the parameter sniffing issue.  Of course, we could use the example above where we recompile the statement for each execution, but there are other alternatives.

  • OPTION (RECOMPILE) – already presented
  • Use local variables

The OPTION (OPTIMIZE FOR (@VARIABLE = VALUE)) tells the optimizer to compile the statement using the “VALUE” vs. using whatever is passed as a parameter.  This option may be useful when calling the stored procedure over-and-over with a particular value.

Here is an example of Optimize for VALUE:

       SELECT *
       FROM [Sales].[SalesOrderDetail] SOD
       WHERE SOD.ProductID = @prod_id
OPTION (OPTIMIZE for (@prod_id = 897));

If we execute the SP and pass value 870:

EXEC DBO.TEST @prod_id = 870;  — 4688 records

We see the following plan:

Key Lookup Plan Option Value

The execution of the SP essentially ignores the value of 870 and uses the plan for 897 instead.  In this case, we wouldn’t want to use the value of 870 often.

The OPTION (OPTIMIZE FOR (@VARIABLE = UNKNOWN)) allows the optimizer to use statistical densities rather than statistical histograms.  All parameters use the same estimated record count.  Some query plans may not be optimal because densities are not as accurate as histograms.  When this option is used, parameter sniffing is turned off.

Here is an example of Optimize for Unknown:

       SELECT *
       FROM [Sales].[SalesOrderDetail] SOD
       WHERE SOD.ProductID = @prod_id
OPTION (OPTIMIZE for (@prod_id = UNKNOWN));

If we execute this SP, passing both the 870 and 897 values, we see something interesting:

Clustered Index Scan - option recompile

Both values use the same plan – Clustered Index Scan.  Looking at the XML output of the plan, we see the StatementEstRows under tag: StmtSimple:

<StmtSimple StatementCompId=”3″ StatementEstRows=”456.079″

This estimate rowcount is the same for both parameter values passed.  As the definition states:  “the same estimated row count” is used for all parameters.

Use Local Variables – when you assign the parameter passed to a local variable in the stored procedure code to a local variable, the optimizer uses the same number of estimated rows for each parameter.

When we use this option, we are effectively “turning off” parameter sniffing.

Here is an example of local variable usage:

       DECLARE @PRODUCT_ID INT = @prod_id  — assign param to local variable
       SELECT *
       FROM [Sales].[SalesOrderDetail] SOD
       WHERE SOD.ProductID = @PRODUCT_ID


In this post I’ve described parameter sniffing, and shown that it is a normal operation performed by the optimizer.  Using the knowledge gained in these examples, and one or more of the solutions presented, you should be able to identify and remedy any suspect stored procedures.  The takeaway from this is to have alternatives when troubleshooting stored procedures where certain parameter values give less-than-optimal results.


Microsoft MSDN – SQL Server:

Microsoft MSDN – Stored Procedures:

Microsoft MSDN – Statistics and Plan Cache:

Microsoft SQL Server 2014: Query Tuning & Optimization, Benjamin Nevarez, McGraw-Hill, 2015.

PowerShell: Finding Unused Indexes in a Database


How can I use PowerShell to find unused indexes on databases over multiple servers?


Using the SMO namespace. Specifically, the server class of the Microsoft.SQLServer.management.SMO namespace.

In this example we create the SQL statement that will query the DMV’s to return indexes that are not being used.

We build the SQL query as a “here” document, and then use the INVOKE-SQLCMD Cmdlet to execute the query against the specific server[s] and database.

function Get-UnusedIndexes
{[CmdLetBinding()] Param
([Parameter (
Mandatory = $true,
ValueFromPipeline = $true)] [string[]] $computername,[Parameter (
Mandatory = $true,
ValueFromPipeline = $true)] [string] $database,[switch] $ShowSQL
Import-Module SQLPS -DisableNameChecking | Out-Null
# Here document – SQL query definition
$SQL = @”
SELECT OBJECT_NAME(i.[object_id]) AS [Table Name] , i.name AS [Unused Index Name] , i.type_desc AS [Index Type] , [Index Enabled] =
WHEN i.is_disabled = 1 THEN ‘FALSE’
WHEN I.is_disabled = 0 THEN ‘TRUE’
FROM sys.indexes AS i
INNER JOIN sys.objects AS o ON
i.[object_id] = o.[object_id] WHERE i.index_id NOT IN ( SELECT ddius.index_id
FROM sys.dm_db_index_usage_stats AS ddius
WHERE ddius.[object_id] = i.[object_id] AND   i.index_id = ddius.index_id
AND   database_id = DB_ID()
AND o.[type] = ‘U’
ORDER BY OBJECT_NAME(i.[object_id]) ASC,[Index Type];

} # End Begin block

foreach ( $computer in $computername )
$result = $null

$computer = $computer.ToUpper()

$test = Test-Connection -ComputerName $computer -Count 1 -Quiet  -ErrorAction Stop

if ( $test -eq $false )
throw “Unable to Ping Server: $computer”

$server = New-Object -TypeName Microsoft.SqlServer.Management.Smo.Server -ArgumentList $computer -ErrorAction Stop

if ( $ShowSQL )
Write-Output “SQL Commands:”
Write-Output $SQL

Write-Output “”
Write-Output “Working Server: $computer”

if ( $server.Databases[$database].IsAccessible )
$result = Invoke-Sqlcmd -Query $SQL -ServerInstance $server -Database $database -ErrorAction Stop
throw “Database: $database, Is Not Accessible”

if ($result -ne $null)
$result | ft -AutoSize
} else {
Write-Output “No Unused Indexes Found on Server: $computername, Database: $database”

} catch {
Write-Output “An Error Has Occurred:”
Write-Output “”
} # End Process block

END {}
} # End Function Get-UnusedIndexes

Execute the function:

PS> Get-UnusedIndexes –computername MyServerName  -database MyTestDB

Working Server:  MyServerName

Table Name Unused Index Name          Index Type                Index Enabled
———-        —————–                       ———-                      ————-
TBL1            PK__TBL1                         CLUSTERED            TRUE        
TBL1            IDX_TBL1_COLC              NONCLUSTERED    TRUE        
TBL2            IDX_TBL2_COLC              NONCLUSTERED    FALSE

The query used identifies any index located in the sys.indexes view. If the index exists in one (sys.indexes) and not the other (sys.dm_db_index_usage_stats) then it’s not being used.  An entry is placed in sys.dm_db_index_usage_stats as soon as the index is used.


If no unused indexes are found, the following message is displayed:

Working Server: MyServerName
No Unused Indexes Found on Server: MyServerName, Database: MyTestDB

The function has two mandatory parameters: computername and database.  Computername is a string array that can accommodate several comma separated server names.  The database is singular, but the code could be modified to check one, or all, databases on the server.  Two try/catch blocks are used, one to ping the server, and the other to test if the database is available.

The switch, ShowSQL, simply writes out the SQL statement for the user.


In this example, I’ve shown how to use the SMO namespace, along with a SQL qeury to find unused indexes on your database. This can be helpful when troubleshooting a query that is performing poorly and the optimizer has chosen not to use an [existing] index within the execution plan.


Find out more information here:

SMO Namespace:


SMO/server class:


DMV: sys.dm_db_index_usage_stats:


PowerShell: Finding Database Sizes in SQL Server using SMO

Question: How can I use PowerShell to find out my database sizes on multiple servers?
Answer:  We can use the SQLPS module and the SMO -> Server class to find out the sizes of all databases on a server instance.
Here is the code:

function Get-DatabaseSizes

[string[]] $computername       #an array of computer names
Import-Module sqlps -DisableNameChecking | Out-Null
$srv = New-Object -TypeName microsoft.sqlserver.management.smo.server -ArgumentList $computername -ErrorAction Stop
# Where … gets rid of system db’s and makes sure db’s are available
$databases = $srv.Databases.Where({ -not $_.IsSystemObject -and $_.IsAccessible })
$databases |
select Name,
@{N = “Size(MB)”; E={$_.size}},
@{N = “DataSpaceUsage(MB)”;  E={$_.dataspaceusage/1024}},
@{N = “IndexSpaceUsage(MB)”; E={$_.indexspaceusage/1024}},
@{N = “SpaceAvailable(MB)”;  E={$_.Spaceavailable/1024}} |
ft -AutoSize


Load the function into your environment, or module file, and then execute it:

PS> Get-DatabaseSizes  -computername ServerOne, ServerTwo, … Server_N

Here is the output  (click on the image for an enlarged view):

SQL Server: Query Optimizer


How does the query optimizer choose a plan for my SQL query?


First, a disclaimer: Don’t perform these commands on any production or unit test server – EVER. These scripts are used for training; so DO NOT package them up and run them in production!

The optimizer in SQL Server is a cost based optimizer. That is, the optimizer assigns a cost to all the candidate plans generated within a period of time, and choses the lowest cost plan for execution.  There are many steps, and bits of information that the optimizer uses to do this.

Thinking like the optimizer:

    • Does the query qualify for a Trivial plan or is Full optimization necessary?
    • Does the underlying table have a Clustered index? Or is it a Heap?
    • Are the columns contained in the Where predicate indexed?
    • Is the predicate sargable?
    • Do the underlying Index[es] have current statistics?
    • Are there joins involved in the query?
    • Is there an Order By clause in the query?
    • Are aggregates being used in the WHERE predicate?
    • others

The query optimizers job is to find the best overall plan in a given amount of time.  Since queries with multiple joins could have thousands of potential candidate plans, generating a “perfect” plan would be time, and memory, prohibitive.  Many times the optimizer will stop the optimization process with a “good enough plan found.”

Your SQL statement goes through four stages on its way to being executed:

Parse -> Bind -> Optimize -> Execute

Parsing: the SQL statement is checked for correct syntax, spelling, and keyword usage.  A parse tree is returned as  a result of this operation.

Binding: the parsed tree is then checked against database metadata to see if columns exist, and if the query is semantically correct.

Optimize: once the query tree is validated, it is passed to the optimizer which will create either a trivial plan, or perform full optimization where several alternate plans are evaluated. This involves phases: Simplification, Trivial Plan, Auto-Stats, Exploration (three stages), Convert to executable plan.

Execution: the optimizer chooses the plan for execution.

Let’s take a look at each bullet point above, and explain a few basic actions of the optimizer with examples.

Trivial plan or Full optimization?

One of the first operations of the optimization process, is to determine if the query qualifies for a Trivial plan.  A trivial plan is one in which there really isn’t any other way to execute the query – like:


The optimizer will go out and scan the table and return the results to the client. Even a query like this, against a Heap (table without a clustered index), results in a trivial plan:

WHERE CustomerID = 10

As the query becomes more complex, with table joins, or complex Where predicates, the optimizer may choose a Full optimization plan. Defining each step of full optimization is beyond the scope of this blog.  I will, however, touch on a couple points involved.

Does the underlying table have a Clustered Index or is it a Heap?

When a table is created, it is created as a Heap structure by default. In a nutshell, a heap is an unorganized structure where records are not stored in any particular order.  By comparison, creating a table and assigning a clustered index implements a B+ tree structure to the data.  Heaps are great if you’re doing massive inserts, or using them as log files.  Heaps are not so good for running OLTP queries though.  When you query a heap table, it has to scan the table to find a particular value identified in the Where predicate.  Imagine that on a table that contains > a million rows!  Here is something to remember:  Scan = BAD, Seek = Good!  At least 99% of the time this is true.

Heaps are also fragmentation magnets, it has to do with the page allocation and deallocation mechanism. So, it is best not to use a heap; unless you have a special need, and have tested their use extensively (that is, before you go to production!).

Creating a table and assigning a carefully chosen clustered index is usually the best alternative.  The clustered index stores data in the order of the clustering key.  Additionally, in a clustered index, the leaf page is the data.  Conversely, a non-clustered index has a Key or Row ID that points to the data in a clustered index or a heap, respectively.  This is known as a “bookmark lookup” and it is an expensive operation.

A final note on clustered indexes, a primary key constraint is created as a clustered index by default. You can also explicitly create the primary key as non-clustered, making it a unique constraint as well if desired.  You may experience this scenario when a surrogate key is created as the clustered index and a primary key is created using a non-clustered, unique constraint on a column, or set of columns, that identifies the business rule being applied.

T-SQL code for finding heaps:

USE AdventureWorks2014        — enter your database
SELECT T.name AS [TableName],
                I.name AS [IndexName],
                I.type_desc AS [IndexType]
FROM SYS.indexes I INNER JOIN SYS.tables T ON 
                I.object_id = T.object_id
—  WHERE I.type_desc = ‘HEAP’  — uncomment to view HEAPS only
ORDER BY TableName, IndexType

Are the columns contained in the Where predicate indexed?

If not, they probably should be. When building the index on the predicate columns, place the equality predicates first, that is: Where Column_a = b and Column_b > c … Column_a would be place as the first, or left-most, column in the index.  If you have a predicate with several columns, SQL Server can use a mechanism called index intersection.  This works when there are other indexes present and they are joined together to satisfy the query.

A “covering index” can also be used. The covering index is created using the columns of the predicate and including those columns also present in the Select statement.


FROM MyTable
WHERE COL4 = value and COL5 = value;

Create index IDX_Covering on MyTable (col4, col5)   — Where predicate columns
Include (col1, col2, col3);                                              — Select statement columns

Since the covering index contains all the data columns needed to satisfy the query, it will only access the index. There is no need to access the [base] table thereby reducing I/O.

Is the predicate Sargable?

When a predicate is [S]earch [arg]ument able, it is able to use an existing index to satify the query.  Some predicates are non-sargable due to data type conversion, or other TSQL functions within the predicate.  This would render any existing index unusable by the optimizer.  No index = no seek operations.  Remember: seek = good, scan = bad!

Generally, these operators are Sargable: =, >, <, >=, <=, BETWEEN and LIKE ‘xyz%’

These aren’t: !=, !<, !>, NOT IN, NOT EXISTS, NOT LIKE, and LIKE ‘%xyz%’

Do the underlying indexes have current statistics?

Since SQL Server uses a cost-based optimizer, it relies heavily on current statistics when analyzing a SQL statement.  Statistics are created automatically, and SQL Server can auto-generate new statistics if they qualify as being out of date.  One of the steps the optimizer performs during full optimization is to load statistics.  The optimizer will then have access to values such as:  the table size, record distribution, cardinality, etc. which it uses to derive alternative plans during the Exploration phase of optimization.

Statistics are created automatically by setting the AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS options.  These are both set “on” by default, and in most cases this is fine.  The DBA may have a particular reason for turning one, or both, of these options off.  They can also be ignored during index creation – using a WITH statement.

ON Person.Person (ModifiedDate)

It is best to check with your DBA concerning statistics questions for your database.

Are there joins present in the query?

If so, are there indexes placed on the columns used in the join predicate? If not, you may see significant performance issues, specifically with I/O.  Indexes built on the foreign key columns will enhance performance similarly to creating an index on a predicate column on a regular table.

Is there an Order By used in the query?

Sort operations can be expensive. One way to avoid a sort operation is to use an index with the prescribed sort order already in place.

Consider the query:

SELECT [OrderDate]
FROM [JProCo].[dbo].[SalesInvoice]
WHERE CustomerID = 20

Note that there is a primary key and a non-clustered index present on the table – SalesInvoice. No index exists on CustomerID, nor the order by column – PaidDate.  The plan below shows a sort (60% of the cost)  and a Key lookup into the clustered index.  If we were to build an index on the Where predicate column, and the order by column, in the sort order (decending) specified, maybe we could eliminate the sort and the key lookup.

sql without index

Here is the create index statement:


Here is the new execution plan:

sql with index noncovering

Well, the sort operator is gone, but we still have the key lookup. Let’s see if we can improve this by creating a covering index, which will be the same as the previous index definition, but include the column: OrderDate.  First, we will delete the index: IDX_TEST_PAIDDATE, and recreate it as a covering index.

Here is the create index statement:


Here is the new execution plan using the covering index:

sql with covering index

That’s better! Now we have an index [only] seek with no key lookup into the base table, and no sort operation.  Eureka!

Please note that the optimizer still chooses what it considers the best plan. You still have to do the work of testing, and re-testing scenarios to see if an index might help your query.

Are aggregates being used in the WHERE predicate?

Aggregate functions Sum(), AVG(), Date() are all great tools, but may also cause lackluster performance when used under certain circumstances. If you are writing a report, or pulling data for a chart, these functions can be very helpful.  However, if you are running an OLTP application, with very tight SLA’s on your transactions, using these functions can decrease performance.

Consider the following queries:

— #1
             , SOH.OrderDate
FROM Sales.SalesOrderHeader SOH
WHERE  OrderDate >= ‘2011-01-01 00:00:00’
               OrderDate <  ‘2012-01-01 00:00:00’;

— #2
             , SOH.OrderDate
FROM Sales.SalesOrderHeader SOH
WHERE YEAR(OrderDate) = 2011;

Both return the same values, but query #2 uses the YEAR() function to retrieve records for the year 2011. In contrast, query #1 uses a range predicate to do the same thing.  The execution plans are nearly identical – they both use clustered index scan operations.

The optimizer will recommend a non-clustered index for query #1, using OrderDate and including SalesOrderID columns. This new index will result in a new execution plan using an index seek operation vs. an index scan.  Query #2 will receive no such recommendation, as an index on query #2 won’t help.


I’ve presented some of the processing that the query optimizer uses to select an execution plan. Query tuning is a very broad topic, and I have mearly scraped the tip of the iceberg here.  Hopefully, this will give you some insite into your own queries, and help eliminate some low-hanging fruit you may have right now.

Many things can affect performance: network issues; disk latency; waits – locking and latching; memory pressure, etc. Knowing how to diagnose the problem quickly is the key, so you must know what to rule out during the troubleshooting process.

I’ll be writing more blogs on the optimization, wait stats, and locks and blocking, with corresponding TSQL to assist you in finding these performance bottlenecks.

Knowledge is Power!

PowerShell: Create a Database Inventory


How can I use Powershell to create an inventory of my databases?


You can use SQL Server’s SMO structures to create an inventory of database objects. Once generated, you can output the result to the terminal in grid format, or others such as CSV or text files.

Specifically, using the name space: Microsoft.SqlServer.Management.Smo.  The Server class within this name space provides a wealth of information for the user to query.

I prefer building functions vs. simple scripts since you can import them into a module (function library) as you build your repertoire.   Start by declaring the function:

function Get-DBInventory
[Parameter (Mandatory = $True,
ValueFromPipeline = $True )]
param ([string] $computerName,
[switch] $ToScreen)

I’ve defined our first parameter as a mandatory parameter as well as being able to pipe the value in from another function on the command line. The second parameter is a switch type, used as a Boolean.  The $ToScreen variable can be set to $true or $false depending on where the user prefers the output to be generated.  By choosing the argument -ToScreen at the command line, the variable will be set to $true.

Next, we need to import the SQLPS module:

Import-Module sqlps -DisableNameChecking | Out-Null

Next, we enter the Process block to do actual work. I’ll explain the snippets as we go.

$server = New-Object -TypeName Microsoft.SqlServer.Management.Smo.Server -ArgumentList $computername -ErrorAction stop
$folder = “c:\temp”
$filename = “DB Inventory.csv”
$fullpath = Join-Path $folder $filename
$result = @()

In the example above, I’ve created a new object using the Smo.Server and assigned it to a variable $server. The –ArgumentList $computername tells PowerShell what server to use, and the –ErrorAction stop is what I want to happen if there is an error creating the object.  This is necessary when using a try/catch block.

I’ve also set up a hard-coded output file name and path (c:\temp) to place an output file should the user select an output file. $result is defined as an empty array to hold the output of our next command.

foreach ($db in $server.Databases)
Write-Verbose “Working DB: $db”
$item = $null
$hash = @{ “DBName”              =     $db.name
“CreateDate”      =   $db.CreateDate
“Owner”    =    $db.owner
“RecoveryModel”     =    $db.recoverymodel
“Size/MB”       =    $db.size
“DataSpaceUsage/KB” =    $db.dataspaceusage
“IndexSpaceUsage/KB” =    $db.indexspaceusage
“Collation”   =    $db.collation
“UserCount”   =    $db.users.count
“TableCount”       =    $db.tables.count
“SPCount”       =    $db.storedprocedures.count
“UDFCount”   =    $db.userdefinedfunctions.count
“ViewCount”           =    $db.Views.Count
“LastBUPDate”      =    $db.lastbackupdate
“LastDiffBUPDate”     =    $db.lastdifferentialbackupdate
“LastLogBUPDate”     =    $db.lastlogbackupdate
$item = New-Object PSObject -Property $hash
$result += $item

The for-each loop above goes through all the databases found in the $server object variable. I’ve defined a hash (key-value pair) to assign the output heading (key) and the information (value) found in the database variable – $db, using the properties found in the Database class.  *NOTE:  you can go to MSDN and look up these class[es] and specify your own properties to query.

An $item variable is created as a PSObject and assigned the values of the $hash. The $item variable is then appended to the array – $result.  The array will contain all the defined values for the specific database being queried.

Next is the output generation code.

if ($ToScreen)
$result |
Select DBName, createdate, owner,
recoverymodel, size/mb, dataspaceusage/kb,
indexspaceusage/kb, collation, usercount,
tablecount, spcount, udfcount, viewcount, lastbupdate,
lastdiffbupdate, lastlogbupdate |
Out-GridView -Title “SQL Server Inventory: $computername”
$result |
Select DBName, createdate, owner,
recoverymodel, size/mb, dataspaceusage/kb,
indexspaceusage/kb, collation, usercount,
tablecount, spcount, udfcount, viewcount, lastbupdate,
lastdiffbupdate, lastlogbupdate |
Export-Csv -Path $fullpath -NoTypeInformation

The Boolean $ToScreen is checked for $true or $false, if $true, then the output is sent to the screen. If $false, then the output is sent to a .CSV formatted file.  In any case, the array variable, $result, is piped to the Select-Object cmdlet (Select) where the output columns (the key) are selected.  These columns are then piped to the appropriate output cmdlet: Out-GridView or Export-CSV for viewing on the screen or saving to a .CSV file, respectively.

Finally, we close out our function definition:

} # End PROCESS block
END {}
} # End Function

Here is the grid output of the DB inventory on my laptop. *Note:  I’ve removed the owner and createdate columns.  You can use the Add Criteria to filter the returned data, or the filter text box at the very top left corner.  You can also select which columns to display by right-clicking any of the column headers.  (click on the image below to enlarge)