Python – Dictionaries

What are they?

A dictionary is an unordered set of key:value pairs where the value is mutable, and the
key is unique and hashable.  A dictionary is defined within a set of braces {}.
Objects such as dictionaries, lists, or other mutable structures may not be used as keys.
Objects such as integer, string, or other non-mutable types may be used as an index.

How are they created?

Dictionaries can be created using various techniques.  A variable name with an assignment operator
and an empty pair of braces will create and empty dictionary.
d = {}

You can also use the dict() function:

 d = dict()
d = dict(one=1, two=2, three=3) or …
d = dict(‘name’:’Ted’, ‘age’:27, ‘height’:74)

Other methods are shown in the example code to follow.

What are they used for?

Dictionaries are very fast for finding data values based upon the key value (similar to a database).
As such they can be used for searching large amounts of data very quickly.  If we need to store, retrieve,
and modify data values, associated by a unique key, the dictionary data structure provides these features.

Examples:

I’ve included several examples of creating, and using dictionaries with the Jupyter Notebook.

*NOTE:  Click on the image to enlarge.

The DictWriter will write rows to the csv file.  Fieldnames are used as the column names, these are
mandatory.  I’ve used the DictWriter class below to insert an unusual car into the csv file.

Summary:

In this blog, I’ve explained the basics of Dictionary data structures.  I’ve also given examples of their
usage, creation, and methods.  To find further information, and examples, review the references listed below.

Knowledge is Power!


References:

Python Dictionaries:  https://docs.python.org/3/library/stdtypes.html#mapping-types-dict

Python – SQL Server & PyODBC Connectivity

Python and SQL Server using PyODBC

Connecting to SQL SERVER with Python is easier than you might expect.  With the PyODBC module, you can create
an ODBC connection to your SQL Server instance.

First you need access to the PyODBC module, which comes with certain implementations of Python.  If you use
Anaconda Core package, you should have it by default. https://docs.continuum.io/anaconda/pkg-docs.html
You can also access it from the creator: Michael Kleehammer on GitHub:  https://github.com/mkleehammer/pyodbc

For this exmaple code, I’m using an Intel based computer running Windows 10 OS, and SQL Server 2016 Developer Edition.

The steps are straightforward enough:

1. import the pyodbc module
2. create a connection object – with a connection string
3. create a cursor object based on the connection
4. create a string to form a SQL DML command:  (Select, Insert, Update, Delete)
5. use the cursor object execute method to execute the command string
6. use one of the cursor fetch statements to pull the data:  fetchone( ), fetchall( ), or fetchmany( )
7. iterate through the data using a loop – foreach or while
8. close the cursor and the connection

*Noteworthy:  This is a simple example highlighting the connection and querying of a SQL Server database.  You would normally enclose this code in try…catch logic with explicit error handling in a “real” work environment.

I’ve coded a basic example of the steps outlined above.  Again, as in other Python examples, I’ve copied
screen output from my Jupyter notebook.  I’ve created a TestDB database on SQL Server, with a test table
to use with this demo.  I’ll be using Windows Authentication to access SQL Server.

SQL table creation:

PyODBC import, connection build, query execution, and displaying results:

SQL Statistics (keys, tables, etc.):

SQL Stats II:

Summary

In this blog I’ve demostrated how you can import the PyODBC module to create a connection, execute a SQL command,
and retrieve data from your SQL Server instance.  I’ve also included examples for retrieving metadata from your
SQL Server instance.  PyODBC is a feature rich module for accessing a SQL Server database.

See the references below for more information on the topic.


References

Websites:
PyODBC on GitHub:  https://github.com/mkleehammer/pyodbc
PyODBC Documentation:  https://github.com/mkleehammer/pyodbc
MS Channel 9 on PyODBC:  https://channel9.msdn.com/Events/Connect/2016/189

Python – Sets and Tuples

 

What are they?

A Set is an unordered, mutable collection that will not retain duplicate values.

A Tuple is an ordered, immutable, collection that can accept duplicates.

How are they created?

Set: 

three basic ways –

Using the set( ) function: s = set( )

Converting a list to a set: s = set(list_name) or s = set( [obj 1, obj 2, obj 3, …] )

Braces notation: s = {‘a’, ‘b’, 1, 2, 2}

Note: You cannot have nested lists when converting a list to a set

Tuple:

Using parenthesis: t = ( )

by assignment using parenthesis: t = (1,2,3,4,5), or without: t = 1,2,3,4,5

Note: to create a tuple with a single entry, you must include a comma. Here is an example: t2 = ( 1, ) This is a rule of the language, a syntactic quirk of Python. If you do this: t2 = ( 1 ), and use the type function: type(t2), it will return an ‘int’ object type, not the expected tuple.

When to use them?

Set[s]: Whenever you need to create a collection of distinct values, and/or apply mathematical set theory to those values ( intersection, union, difference, etc. ). Or, if you need to convert an existing list containing duplicate values, you can specify it in the set function.

Tuple[s]: Used mostly for values that need to remain constant.

Methods to use:

You can use the dir( ) function to find all methods and “dunders” contained for the type.

dir(set) | dir(tuple)

Since Tuples are immutable, they do not contain many of the common methods to append, clear, add, etc. In fact, only two exist: Index and Count

Example Code:

Here are some code examples from my Jupyter Notebook.  I’ve included some basic examples for both Sets and Tuples.  *Click on the image to view a larger picture.

 

Summary:

In this blog entry I’ve included the definition of a Set and a Tuple.  I have reviewed their usage, creation,  and shown some example code using both.

My next blog will focus on using Python with PyODBC to connect to an instance of SQL Server 2016, perform a SELECT statement on an existing table.  I’ll then use a For Each loop to review the data returned.  It’s very simple to do, and only requires a minimum of coding to perform.

Knowledge is Power!


References:

Python Tuples:  https://docs.python.org/3/library/stdtypes.html#tuple

Python Sets:  https://docs.python.org/3/library/stdtypes.html#set

Jupyter Notebook:  http://jupyter.org/

Python – Lists

Lists are similar to C array types.  A list can accept any Python object, including other lists.

You can create a list in several ways:

a = list()    # creates an empty list (a)

a = [ ]        # using brackets to create an empty list (a)

a = [1,2,3,4,5]    # creates a list of integers 1-5

a = [1, ‘SQL’, 3.45, [1,2,3] ]    # Creates a list of various data types: integer, string, float, and another list

a = [ (1,2), (101,102) ]    # Creates a list of Tuples (later)

a = [ {‘City’: ‘Seattle’, ‘State’: ‘Washington’, ‘Population’: ‘2.3M’} ]   # Creates a list of Dictionaries (later)

Range:

You can create a list of integers using the range() function:

x = list(range(9))

x = list(range(1,10))

x = list(range(1,10,2))   # 1-10, by 2 (called the ‘step’)

prints:

[1, 2, 3, 4, 5, 6, 7, 8][1, 2, 3, 4, 5, 6, 7, 8, 9][1, 3, 5, 7, 9]   # note: all odd values

 

 

You can convert another object to a list:

x = ‘this is a string’

y = list(x)

Python lists keep insertion order, that is, when you access the list contents, they will be in the order they

were inserted.  There are other collections that do not follow this functionality.

You can print the list:

Python 3.x:

print(a)

Python 2.x:

print a

List are mutable, that is, you can change them after creation.

Append to the end of a list:

a.append(‘Server’)

Insert to a specific location within the list:

list_name.insert(index, value)

a.insert(2, ‘Server’)

Clear the list of all it’s contents:

a.clear()

You can also just overwrite the list content:

a = 1,2,3,4,5

a = ‘Python is Cool!’

One often desirable function to perform on lists is to sort them:

a = [1,2,33,5,7,22,101,3]

a.sort()

print(a)

prints:

[1, 2, 3, 5, 7, 22, 33, 101]

This is an in-place sort, meaning it does not return a sorted list, it sorts the existing list values.

The sorted() function returns a new sorted list:

old = [1, -1, 22, 6]

print(old)

new = sorted(old)

print(new)

prints:

[1, -1, 22, 6][-1, 1, 6, 22]

 


 

References:

Python Website:  www.python.org

Python lists:  list_URL

New Python Programming and Data Science Blog!

With my recent (Oct 2016) layoff from Dell Technologies, I’ve had some time to work on additional IT topics.  According to several industry groups, Data Science is one of the top fields in the IT industry right now.  So, I decided to learn what I could about it, and perhaps move into the field.  I set up some goals and thoughts on what it might take to get me up to speed on the topic, short of going back to college and getting a masters degree.

Some of my goals and thoughts included:

  1. Find any online schooling (free preferred) that might be of use.
  2. Find a local “bootcamp” school that I could attend.  This would get me integrated into the field quickly.
  3. Read, read, read…  Buy books on Data Science and any supporting languages.
  4. Find a mentor or a Meetup group in the local area.
  5. Build a portfolio of programs, examples, etc. as a resume item.
  6. Put code and findings on my website:  SQLSmarty.com

To date, I’ve done most of the six items listed above.  I did enroll in online training through Coursera, an online site set up by a couple Stanford University professors.  I looked into bootcamps, and found a local one, but the tuition was cost prohibitive.  I ordered a whole library of books on Data Science [that I could understand] and the Python programming language.  I will attend a Meetup group here in Austin this month.  I have some code examples using Python, NumPy, and Pandas, but have not compiled and edited them to post on this site yet.

What I found through my reading and poking around on the Internet was that Python had some of the best API’s for Data Science [DS], and that it’s fairly easy to learn.  Armed with that information, I logged on to Amazon and started buying books on DS and Python at an alarming rate.  I also enrolled in a Python course through Coursera that is sponsored and taught by professors at the University of Michigan.  These courses are certificate only, but it shows that you’ve done the work at least.

I installed Python (Anacoda) on my MacBook Pro, and off I went.  First off, Python is a very cool language.  Developed in the 1980’s, it is very much industrial strength now.  It is an interpreted language, but it’s so easy to use and very powerful since it is optimized to do specific operations very quickly.  It is Object-Oriented, and everything in Python is an object – no primatives.  I started using Python when my first [of many] book arrived from Amazon.  Within a couple days, I was enrolled in a Python course at the University of Michigan (Coursera) and was banging out code!  Writing programs that would take two pages of C# code in about 10 lines of Python.

So, as I’ve moved into the DS portion of Python, I’ll be adding coding examples that will include a look at NumPy, and Pandas.  Pandas is one of the most used data science API’s, and it’s free.  Just import it and start using it!  Plotting and charting the findings is also an important feature of data science.  I’ll share some examples of Matplotlib to illustrate how easy it is to create a graphic of the results of your work.  Again, with very few, but intuitive, lines of Python code.

So, check back soon as I will begin posting code examples to the new Python blog!

Knowledge is Power!

Ted.

SQL Server: TempDB

Question:

What is TempDB?

Answer:

This is a short blog post on TempDB.  TempDB is essentially a scratch database for SQL Server.  It’s also one of, if not the, busiest databases on an instance.  Learning what it does, and does not do, is important both for the DBA, but also for the developer.  Setting up TempDB is also an important task – sizing, creating multiple TempDB files, and configuring autogrowth correctly will help make your DBA life simpler.

I’ll start off with a list of facts on TempDB, followed by some DMV’s to look at TempDB useage and performance, and finally a word on troubleshooting.

Some TempDB facts:

  1. Only one TempDB file can be present on an instance
  2. Essentially the scratch database for SQL Server
  3. TempDB contains a data file and a log file
  4. You cannot backup TempDB
  5. TempDB uses Simple recovery model
  6. TempDB is cleared upon instance restart
  7. TempDB uses the Primary file group, you cannot add other file groups
  8. You can have multiple TempDB files within the filegroup
  9. Multiple TempDB datafiles should be sized the same, and their Autogrowth size should be the same. Don’t use percentages for autogrowth
  10. Don’t accept the default AutoGrowth configuration for TempDB
  11. TempDB holds three(3) things: user data, internal data, and the Version Store
  12. TempDB doesn’t log redo information, it does still log undo information. This allows better performance when performing large changes vs. a user database
  13. Table Variables: defined with the “at” sign – @
    • Declare @TempVar Table …
    • Not created “in memory” they live in TempDB, just like temporary tables
    • Cannot be assigned indexes, but you can assign constraints; including primary key or a unique constraint
    • Table Vars do not use statistics, even if a key or constraint is assigned. They use a set value (1) for cardinality
  14. Temporary Tables: defined with a hashtag – “#”, global Temp Tables are defined using a double hashtag – “##”
    • Create Table #TempTable (or ##TempTable) …
    • Unlike Table Variables, Temp Tables contain statistics
    • They can be indexed (clustered, non-clustered)
    • They can be defined as global access, where any session connected to the database can access them
    • Temporary tables are fully transactional aware, table variables are not
    • Temp Tables are limited in scope to the session that created them, once the session is gone, the temp table is dropped

# 3 & # 8.  TempDB contains a data file and a log file:

TempDB_Files_SSMS

# 4.  You cannot backup TempDB:

Backup_TempDB_SSMS

# 5.  TempDB uses Simple recovery model:

TempDB_SimpleRecoveryMode

# 6.  TempDB clears upon SQL Server restart:

Before:

TempDB_TemporaryTables_Before

After:

TempDB_TemporaryTables_After

# 7.  TempDB uses the Primary filegroup, you cannot add additional filegroups:

Secodary_FG

# 13 (d).  Table Variables do not contain statistics:

TableVar_Cardinality

# 14 (b).  Temporary Tables contains statistics:

TempTable_Cardinality

# 14 (c).  Temporary Tables can be indexed (Clustered and Non-clustered):

TempTable_Index

One additional item worth mentioning is the Version Store.  TempDB is used for Read Committed Snapshot Isolation (RCSI), and Snapshot Isolation where the row version is stored.  You can read more about TempDB and the version store here:

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2008/12/30/managing-tempdb-in-sql-server-tempdb-basics-version-store-simple-example/

Impacts on TempDB

Since each data file has an associated meta-data file, whenever requests come in to TempDB the meta-data file has to be updated.  This process is single threaded and can cause contention when many requests are pending.  To enforce data integrity, SQL Server uses a lightweight lock called a latch.  This latch makes sure only one request thread is accessing the page at a time.  Because of this, multiple requests can be queued waiting for their turn to update the data page.

Using more than one TempDB datafile helps reduce this contention because there is another file available along with it’s associated meta-data file.  While the process to update the meta-data file is still single threaded, the presence of two, or more, datafiles spreads out the requests.

You can view wait contention against TempDB using the DMV:  sys.dm_os_waiting_tasks, the wait types we are concerned with are:  PAGELATCH_EX, and PAGELATCH_UP.  You can dive further into the process of waits by using the DMV:  sys.dm_os_wait_stats.

Sizing the datafiles of TempDB the same.  TempDB uses a proportional fill algorithm to populate datafiles equally.  Sizing the datafiles equally allows this algorithm to distribute data properly.

Large datafiles may impede sql server start up when a shutdown has occurred.  Using IFI (Instant File Initialization) can help this process.   Instant File Initialization is set up at the server level.

Memory Spills to TempDB

There are a few ways that memory requests can “spill” over into TempDB.  One way is due to a cardinality mismatch between the table and what the optimizer sees.  Memory grants are created by the optimizer based upon the cardinality (rows that satisfy the query).  This is directly related to the presence of up-to-date statistics.  If the optimizer (cardinality estimator) uses the wrong value for cardinality, then it may not grant enough memory to perform the work.  When this happens, it spills to TempDB for more memory.  Since reading from disk vs. RAM is far slower, this can become an issue depending on the query and execution frequency.

Best Practices

  1. Place TempDB on either a fast disk, RAID, or SSD. This should be an autonomous device, containing only TempDB
  2. Use multiple data files within TempDB
  3. Size the data files the same, along with their associated autogrowth values
  4. Don’t use percentages in data file autogrowth
  5. Configure Instant File Initialization (IFI) – for data files, log files cannot use IFI
  6. Watch for latch contention with DMV’s:
    • dm_os_waiting_tasks
    • dm_os_wait_stats
  7. Latch types:
    • PAGELATCH_EX
    • PAGELATCH_UP
  8. Use Table Variables and Temporary Tables sparingly
  9. Table Variables do not use statistics, wereas Temporary Tables do. This can have a direct impact on performance
  10. Keep statistics current
  11. Eliminate unnecessary sorting operations – large sorts might spill into TempDB

Summary:

In this post I’ve explained what you can, and cannot, do when setting up and maintaining TempDB.  I’ve also shown some best practices and described temp variables and temp tables.  TempDB is being used more and more by the SQL Server OS to perform many functions.  Therefore, setting up TempDB correctly, on the correct drive type, can dramatically increase its performance.

Resources:

Books:

Microsoft SQL Server 2014:  Query Tuning & Optimization, Benjamin Nevarez, McGraw-Hill, 2015.

SQL Server DMVs In Action, Ian W. Stirk, Manning Publications Company, 2011.

MSDN:

Best Practices:

https://blogs.msdn.microsoft.com/cindygross/2009/11/20/compilation-of-sql-server-tempdb-io-best-practices/

General:

https://msdn.microsoft.com/en-us/library/ms190768(v=sql.120).aspx