This week was Canadian Thanksgiving so this post on metadata stuffing is timely.
Today Thomas LaRock (@sqlrockstar | blog ) posted a rant about our Database Design Throwdown topic on naming standards for tables and other database objects. Tom is a fan of what I call “metadata stuffing” in object names. That’s basically shoving as much additional information as one can into object names so that one does not have to go find out that information from its rightful place.
My reason for wanting to use prefixes is simple enough: I want to know if I am looking at a table or a view when reviewing code.
Karen doesn’t believe that anyone should be using object names as a place to store meta data about the objects themselves. I would like to agree with her but then we’d both be wrong.
He even created the nice graphic of my quote I use here. Thanks, Tom.
Object Confusion Abounds
That quote is indeed one I use during our debate. It’s a snarky introduction to what I think the real problem is: our tools aren’t doing enough to help us with this potential confusion of tables and views. When you are writing or looking at a query, the syntax for referencing a table or a view is exactly the same.
When a developer writes:
USE AdventureWorks; GO SELECT Name, ProductNumber, ListPrice AS Price FROM Production.Product ORDER BY Name ASC; GO
He has no idea for certain if Product is a table or a view. The syntax is the same. And if the developer is working in a text editor of some sort, or with some native tools, there’s no tooltip or other help that the he can use to check what type of object they are querying.
Why does this matter? Tom has a great presentation where he shows the impacts of trying to make stuff work and you don’t know what you are looking at. That’s why Tom wants to do this with his objects:
So that the object type is injected into the object name. That seems so innocuous, doesn’t it? What could go wrong?
I’m here to tell you that this is a slippery slope. One of the most egregious examples of this sort of meta data stuffing I’ve run across is one that required all this meta data to be prefixed in front of every table name:
- tbl_ prefix
- Primary Systems that managed the table
- Primary subject area that the table belongs to in the data model
- Classification of the role the table plays in the database (Associative Entity, Domain Entity, Master Data, Reference Data, Log Data, etc.)
- Three letter login of the DBA responsible for administering this table (I KID YOU NOT).
So in this wonderful naming scheme, we’d get:
I have found that once an organization starts thinking of stuffing, their designs become turkeys really fast. It’s ugly. Think about the tools you use, with all those nifty object lists on the left side. To find a table you need to know all that great metadata as you scroll through the list, hoping that PRODUCT Is buried in there, somewhere. And what the heck was the name of that guy that dressed funny who did all that data stuff for the company before he won the lottery?
All that metadata that should have been managed elsewhere, not prefixed in front of the “real” name of the table. In fact, it was. In the data model and in the system catalog. Every time any of that data changed (DBA assignment changes, DBA wins the lottery, whatever), we had to rename the table and change all the code and reports that referenced it. Sure we could have isolated systems by this change by using views and or aliases but that is additional complexity for no performance gain, either. Refactoring might have helped, but eventually we’d still have to change all the code and queries.
Why I Hate Metadata Stuffing
- It’s redundant data. Just like with business data, the reason we want to minimize redundant data is because we then have to worry about updating the data in multiple places. There’s cost and added risk for that.
- It changes. I don’t know about you, but I don’t have the luxury of taking down a production system just to update a change in the name of the DBA’s favourite TV show or whatever lame naming scheme someone thought up. Sure, tables can’t change into views or indexes, but all the other type of stuffings will change.
- It takes up real estate. I get all kinds of flack from developers and DBAs for the length of object names when I want the names to be meaningful. It’s funny how spelling out CUSTOMER is unacceptable, but adding the exact same characters in front of every object of its type is A-OKAY. What’s up with that? Somehow optimizing names for developers is more important than loving your data? Show me where it says that in the Project Charter.
- tbl_ is a tell for bad database design. I don’t know where this particular naming scheme originated, but when I do a database design review and I see this naming scheme, I know that the designer learned design in a one hour webinar “training course” and has not really mastered the complexities of enterprise database design and maintenance. The design will be less than best practice 90% of the time. This naming scheme is prominent in programming books, introduction to database books, presentations by non-database people, and uninformed blog posts, by far. It’s not popular with people who do professional database design. Sure, some products use this, too, but do you really want to take database design best practices from vendors? How many professional data architects do you think they have on staff? I will most likely see a database design that is highly optimized to make development go faster. Not for data integrity or loving data.
- It’s not needed “for consistency”. One of Tom’s points is that if we are going to prefix views, we have to prefix tables to be consistent. Actually, no, we don’t. If we have to bite the bullet and prefix views because our tools let us down, we can choose not to clutter up tables names just to punish those objects, too. I’m assuming that since Tom prefixes tables and views, he prefixes columns, too, right? its just being consistent. < snort>
- It gets in the way of using the data. Tables and Column names are the most user-facing parts of a database design. When we in IT insist on munging up these names with a bunch of systemese, we make it more difficult for business users to get at their data. It shows that we have optimized the database design to help a relatively small number of technical users (developers, DBAs, ETL folks) over the needs of the business. Ultimately, we build databases to manage data. For the business.
What’s the Cost, Benefit and Risk?
One of my Splendid Truths is that all design decisions should assess cost, benefit and risks. In the overall scheme of things, just prefixing “tbl_” in front of a table name isn’t that costly and it isn’t that risky. Tom assesses his designs based only on potential for performance harm according to his post. He “laughs” at my position. I’m happy that my stance on metadata stuffing brings happiness to his day. But performance is only one data point out of many for making a design decision. Usability, clarity, business goal support are other factors that a database architect needs to consider when assigning a name to an object. If we optimize something for a subsystem, we do it as the expense of other subsystems.
Our Tools Should Help Us More
Having said that, I feel the pain of people having to work with sub-standard tools or having to use tools that just refuse to help. Tom showed how SQL Server Management Studio tooltips can help. But all those command line “I don’t need any stinking help” aficionados are left on their own to know what they are looking at.
Oh, and Tom:
ProperNoun_Tom, Pronoun_you Verb_made Pronoun_me Verb_laugh Preposition_with Possessive_your Noun_post.
(See how all this stuffing gets in the way?)
One of my other Splendid Truths about database design is:
Your tools will impact your data models and database designs more than you can imagine.
We shouldn’t sit back and let that happen. Stuffing is great with Tofurky, not with databases.
I’ll be doing two sessions at SQL Saturday San Diego this weekend:
The first is with co-presenter, Tom LaRock (@sqlrockstar | blog), where we debate, whine (Tom) and win (me) several database design approaches and methods in front of a live audience (you!). This is a warm up for our PASS Summit spotlight presentation.
Database Design Throw Down
Karen and Tom debate about the options and best practices of common and advanced design issues, such as: * Natural vs. Surrogate keys * NULL vs NOT NULL * Datatypes * Agile Database Design * Database Refactoring * Identity Crisis ? …and others. Bring your votes, your debates, and your opinions. Help us figure out who’s right and who wrong…or less right.
Session Level: Intermediate
My second presentation is on career management.
Career Management for Data Professionals
Career Success in Data Management during Turbulent Times: A workshop on issues and ideas that today’s data professionals can do to build their careers and networking skills with other data management professionals. Workshop topics will include: • Demonstrating your expertise • Building a portfolio of your success stories • Getting others to sell your skills and business value • Building & extending your data management skill set • 10 Steps to highlighting you and your work Bring your thoughts, ideas, and experiences.
Session Level: Beginner
There are many great speakers at this event and it’s FREE for a full day of learning. Registration is still open, but it is common for these events to sell out before the event. Register now!
On Thursday, 2 August I’ll be debating with Tom LaRock (@sqlrockstar), giving a preview of our SQL Saturday presentation of Database Design Throwdown: The Trailer. In this wonderful smackdown, I’ll be talking about the importance of data quality, integrity and data governance while preparing database designs. I’m pretty sure Tom will be spouting wildly crazy, kooky ideas about performance, optimizing design to make life easier for DBAs and … I have no idea what else. Probably bacon. We won’t be giving the same presentation as on Saturday — it will be more of a trailer version of that. Oh, wait…that doesn’t quite sound right. It will be a teaser. Yeah. Something like that. A teaser.
Kansas City SQL Server User Group
Details about the SQL Server User Group meeting:
- 3:45 – 3:50 Greeting and Housekeeping
- 3:50 – 5:00 Database Design Throwdown: The Trailer
- 5:00 – 5:15 Door prizes and wrap up
Overview: If a man is alone in the forest and there is no woman around to watch him design a database is he still wrong? Join us in this highly interactive debate regarding the options and best practices of common and advanced design issues such as natural versus surrogate keys, NULL versus NOT NULL, data quality versus performance, and others. Bring your opinions and experience and join the discussion.
Thomas LaRock is a seasoned IT professional with over a decade of technical and management experience. Currently serving as a senior database administrator with Confio Software, Thomas has progressed through several roles including programmer, analyst, and DBA. Prior to that, he worked at several software and consulting companies, working at customer sites in the United States and abroad. Thomas holds a MS degree in Mathematics from Washington State University and is a member of the Usability Professional’s Association. Thomas also currently serves on the Board of Directors for the Professional Association for SQL Server (PASS), is a SQL Server MVP, and can also be found blogging at http://thomaslarock.com and is the author of DBA Survivor: Become a Rock Star DBA (http://dbasurvivor.com).
Karen Lopez is Sr. Project Manager and Architect at InfoAdvisors, Inc. Karen is a frequent speaker at conferences and user groups. She has 20+ years of experience in project and data management on large, multi-project programs. Karen specializes in the practical application of data management principles. Karen is also the ListMistress and moderator of the InfoAdvisors Discussion Groups at www.infoadvisors.com.
8700 State Line Road
Leawood, KS 66206 (map)
Data Model Driven Database Design
On Saturday, 4 August I will be presenting at the Kansas City SQL Saturday (aka #SQLSat159 — don’t get me started on why they use a surrogate key as their names for these…) on Model Driven Database Design
Model-Driven Database Design
Model-Driven Database Development: Myths, Magic and Methods. In this presentation, Karen discusses data model-driven database development from the point of view of the Data Architect, the DBA, and the Developer. She will cover topics such as "Who does what?", "Why are we doing this?", "Do I have to Use a GUI?" and "Just who do you think you are?". Demos, too. Finally, 10 tips for making model-driven database development successful in your organization’s culture and environment.
Session Level: Beginner
Location: Cerner Corporation’s Riverport Campus, 6711 NE Birmingham Rd, Kansas City, MO, 64117
And Tom and I will be doing our full debate on Database Design: The Throwdown, as described above. Registration is required for the SQL Saturday, but it’s totally free – you get swag, prizes and access to some of the best speakers in the SQL Server community. I attended this last year in Kansas City and they did a fabulous job. You want to be there, too.
A developer, Justin Reese, just shared his own story about using the wrong datatype for phone numbers, over at DailyWTF. He did this in an XML document, but I see the same mistake being made over and over again in data models and database designs. In fact, this is a key part of my Data Modeling / Database Design Blunders presentation.
My client reported that was a strange bug on a certain page in an app I built for them. Where the contact information for a series of offices was being displayed, all the information was correct except for one piece: the phone number. For multiple locations, the phone number displayed was the same: 214-748-3647.
I love reading about his quest to track this problem down and what the issue turned out to be. I also love that he wrote a DailyWTF about himself. We all should be doing that: sharing our mistakes so that others can learn from them. I call that "Free Advice That’s Paid For" in my blog posts.
In my presentation my first blunder is using numeric datatypes for data values that aren’t actually numbers. Telephone numbers are one of them. They may have leading zeros. We don’t do math on them, usually. ZIPCodes are another example. Store them as INTEGER and you’ll lose leading zeros. And many Postal Codes have letters. Think you have only US customers? You might. But customers, people who may owe you money, have a way of moving around. Of course, every design decision comes down to cost, benefit and risk. So some designs may make a good case for using numeric datatypes for storing values that aren’t actually numbers. But all the protections for data quality and correct retrieval need to be designed in, too. That’s the trade-off. Also in my presentation I give a rule of thumb:
If business users call it a number, it ain’t a number.
Customer Number. Account Number. Vehicle Identification Number. Social Insurance Number. Social Security Number (yeah, it’s all numbers now, but nothing would stop the powers that be from changing that). This is especially true for numbers that are managed by people outside your organization. You just don’t know when they might decide to add letter or special characters.
I get feedback from at least one person at each presentation that my blunders are way too obvious or that they aren’t serious mistakes. As much as I see poor or inaccurate datatype selection, I have to politely disagree. These are the number one mistakes I see. They compromise data quality, lead to tragic data errors, even. Storing numbers that in fact aren’t numbers as INTEGERS or other numeric datatypes is error prone, leads to nasty slow queries due to all the casting and table scans that may happen. Eventually, those incorrect data values are going to come looking for you. Usually after work hours, in production. If you’ve never seen them in the wild, then either you don’t get out enough of you’ve been blessed by working with highly competent data modelers and database designers. And we all know how rare those are.
…and, yes, this is quote from one of my former DBAs.
Subscribe via E-mail
- Karen Lopez on Strutting: We all Know When You are Doing It. So Stop.
- Joey D'Antoni on Strutting: We all Know When You are Doing It. So Stop.
- Karen Lopez on Strutting: We all Know When You are Doing It. So Stop.
- Thomas LaRock on Strutting: We all Know When You are Doing It. So Stop.
- Karen Lopez on Strutting: We all Know When You are Doing It. So Stop.
- May 2013 (5)
- April 2013 (5)
- March 2013 (4)
- February 2013 (7)
- January 2013 (12)
- December 2012 (2)
- November 2012 (3)
- October 2012 (3)
- September 2012 (13)
- August 2012 (5)
- July 2012 (17)
- June 2012 (2)
- May 2012 (4)
- April 2012 (4)
- March 2012 (8)
- February 2012 (11)
- January 2012 (3)
- December 2011 (10)
- November 2011 (8)
- October 2011 (5)
- September 2011 (3)
- August 2011 (9)
- July 2011 (5)
- June 2011 (5)
- May 2011 (5)
- April 2011 (9)
- March 2011 (4)
- February 2011 (9)
- January 2011 (8)
- December 2010 (15)
- November 2010 (27)
- September 2010 (2)
- August 2010 (1)
- July 2010 (4)