Wednesday, March 29, 2006

GUIDs and Databases

For those of you who don't know what a GUID is it is a 128 bit value that is "Unique across space and time". A GUID looks like this: B2658C9D-A76G-4D72-B0E4-B732332408D6. There garanteed uniqueness has been used by the com+ system in windows for years. This is why two com components with the same name can be installed without conflicting (they each have a GUID that keeps them seperate).

Developers seem to be pessimistic about GUID collissions in their systems. I just have to say that it will not happen. As I said com+ has been using this forever and I have never heard of a GUID collission ever. GUID itself stands for Globally Unique IDentifier. i.e. accrosss the globe no one else has the same GUID.

Now onto why these are usefull in databases:

1. Merging of data
If you have 2 databases with the same tables but are currently seperated (i.e. an identical sales database in 2 seperate offices). If you want to combine these into a data warehouse you will have issues with numerical identies colliding. If you used a GUID then there would be no conflict.
If you are using replication then GUIIDS are your friends for this same reason

2. Security
If you look at this query string
I would guess that chaning that userId up or down one would allow me to be another person on the system. Granted it is bad design to expose this information to the user (hide it in session state instead so the client never sees it). But as I always say security should be layered. If we used a GUID for our userIds we would have a query string like this:
By its very deffinition it would be impossibly for anyone to brute force this (they could never generate another GUID that matches one in our system because they are globally unique).
3. Disconnected Data
Typically when creating data we create the records, insert it, then read back the numerical key that sql has created. By using a GUID we can set our key in data to be that GUID and fire it off to the database knowing for sure that no other ID like it will exist in the datbase.
4. Integer Data Space
You can run out of integers in a database (I can't remember the limit) for IDs but by using GUIDs you can have limitless number of IDs without fear of exhaustinga limit.

The Cons
Speed: It takes time to generate a number that is that unique. This takes CPU cycles
Readability: It is easier to read and type a query like select * from table where ID=30 than select * from table where ID='B2658C9D-A76G-4D72-B0E4-B732332408D6'
Space: GUIDs are larger than integers (4 times larger) and take up more space. I am of the adage that storage is cheap now so unless you are going to have a massive massive massive database this does not concern me that much
Page Splits: Due to the size of GUIDs they can contribute to page splits in SQL
Searching and Comparison: It takes a bit longer to compare two guids when searching that it does an integer


Post a Comment

<< Home