by Sarang Nagmote
Category - Databases
More Information & Updates Available at: http://http://insightanalytics.co.in
Saying that you should use the correct data type seems like something that should be very straight forward. Unfortunately, it’s very easy for things to get confusing. Let’s take a simple example from AdventureWorks. If I run this query:
SELECT a.ModifiedDateFROM Person.Address AS aWHERE a.AddressID = 42;
The output looks like this:
2009-01-20 00:00:00.000
Normal right? You see the year, the month, and the day followed by the time in hours, minutes, and seconds as a decimal. Ah, but there is an issue. This query is supposed to be for the reporting system, and the business only cares about the date that the values in the Person. Address table has been modified, so they don’t want to see the time. Also, the person in charge is kind of picky. They really don’t like seeing the date formatted that way. They prefer to see “January 20, 2009.”
Far too often then, the easy answer, just change the column to varchar. You can trim the time and output in exactly the format needed by the business. Problem solved and it was easy…
Well, until someone inputs “Janry 20, 2009” slightly mangling the spelling and suddenly your report looks all messed up. Or, they ask you to start filtering just the last two weeks, regardless of when the report was run and you now can’t easily do date math on the column. Even after you get over that problem with a little formatting using CAST (along with ISDATE to try to catch all those other typos that are in the system now) you notice that the performance is really slow so you go to put an index on the column and now you have an index key that is 50 bytes wide instead of the 3 bytes that the DATE data type would have been, making the index less efficient (not to mention, sorting the data is going to put February ahead of January, more formatting).
The list goes on and on. You’re going to hit issue after issue and all you tried to do was a little formatting, make the data look pretty by using the VARCHAR data type in place of DATETIME or DATE. This kind of thing happens all the time, especially to people just getting started. The rules should be very simple, use the correct data type for the data you’re collecting. However, it’s easy to get distracted. It’s easy to get pushed. The business says “put it in this format” and the only way you can do that is by changing the data type so…
Don’t do it. Take the time to understand the implications of mangling the data types. It’s not just date and time that cause issues either. Lots and lots of stuff can be easily stuffed into a VARCHAR field that more properly should be an INT or a DECIMAL or some other data type. Be able to communicate with the business so that they understand that you’re not simply being difficult, you’re trying to protect the functionality and integrity of the data. Show how formatting can be done using client tools. Explain all the downstream issues that this decision will have. Then, use the correct data type for the information being stored.
No comments:
Post a Comment