Slides from my frequent DAMA / Enterprise Data World presentation on Data Modeling mistakes. You can click on the stopwatch in the player to auto-advance the slides.
There’s no sound; these are just the slides. If you’d like attend a presentation on this topic, ask your local user group (DAMA, ERwin, PASS, etc.) to invite me.
Sure, data modeling is taught in many training classes as a linear process for building software. It usually goes something like this:
- Build a Conceptual Data Model.
- Review that with users
- Build a Logical Data Model
- Review that with users
- Build a Physical Data Model
- Give it to the DBA
- GOTO step one on another project.
And most team members think it looks like this:
Training classes work this way because it’s a good way to learn notations, tools and methods. But that’s not how data modeling works when the professionals do it on a real project.
Data modeling is an iterative effort. Those integrations can be sprints (typical for my projects) or have longer intervals. Sometimes the iterations exist just between efforts to complete the data models, prior to generating a database. But it’s highly iterative, just like the software development part of the project.
In reality, data modeling looks more like this:
This is Data Model-Driven Development. The high-level steps work like:
- Discuss requirements.
- Develop data models (all of them, some of them, one of them).
- Generate Databases, XML schemas, file structures, whatever you might want to physically build. Or nothing physical, if that’s not what the team is ready for.
These, again, are small intervals, not the waterfall steps of an entire project. In fact, I might do this several times even in the same sprint. Not all modeling efforts lead to databases or physical implementations. That’s okay. We still follow an iterative approach. And while the steps here look like the same waterfall list, they aren’t the same.
- There isn’t really a first step. For instance, I could start with an in-production database and move around the circle from there.
- We could start with existing data models. In fact, that’s the ideal starting point in a well-managed data model-driven development shop.
- The data models add value because they are kept in sync with what’s happening elsewhere – as a natural part of the process, not as a separate deliverable.
- The modeling doesn’t stop. We don’t do a logical model, then derive a physical model, throwing away the logical model.
- Data modelers are involved in the the project throughout its lifecycle, not just some arbitrary phase.
- Modeling responsibilities may be shared among more roles. In a strong data model-driven process, it is easier for DBAs and BAs to be hands-on with the data models. Sometimes even users. Really.
By the way, this iterative modeling approach isn’t unique to data models. All the models we might work on for a project should follow this project. Class diagrams, sequence diagrams, use cases, flow charts, etc. should all follow this process to deliver the value that has been invested in them. That’s what Agile means in “the right amount of [modeling] documentation”. Data model driven development means that models are “alive”.
If you are a modeler and re-enforcing the wrong perceptions of needing a waterfall-like approach to data modeling, you are doing it wrong. You might be causing more pain for yourself than anyone else on your project.
Data Models aren’t just documentation checklist items. They model the reality of the living, breathing systems at all points in its life. They deliver value because they are accurate, not because they are “done”.
Slides from my frequent DAMA and Enterprise Data World presentation on data management career success.
I introduce to you, NoSQKitty
With this dialog, I had to do this. I had no choice, really. Trust me. Just ask Biznis Kitty.
Emmet: I’m just gonna come right out, I have no idea what’s going on or what this place is at all.
Unikitty: Hi! I am Princess Unikitty, and I welcome you all to Cloud Cuckoo Land!
Emmet: So there are no signs or anything. How does anyone know what not to do?
Unikitty: Here in Cloud Cuckoo Land, there are no rules. There’s no government, no baby sitters, no bedtimes, no frowny faces, no bushy moustaches, and no negativity of any kind.
Lucy: You just said the word "no" like a thousand times.
Unikitty: And there’s also no consistency.
Batman: [the clown and the lizard man are dancing around him] I hate this place.
Every single line in that scene had me choking on my popcorn. There’s a blog post in each one. No rules? Nope, not in schemaless. No signs? Nope. No bedtimes? Nope, none. As a matter of fact, I want to make up t-shirts with each of these lines. Everything is Awesome about them.
I’m not anti-cloud, at all. Nor am I anti-NoSQL (Hey, I know that’s a double negative. Don’t blame me that the name NoSQL seriously needs rebranding.) Plus, with a Starbucks name of Kitty, this is *so* my character. Cloud Cuckoo Land and all.
*And I really do get what eventual consistency is all about. I know it means there is consistency. I know when it’s perfect for solving a problem. I’m just quoting Unikitty. Blame her. But watch out for Angry Kitty if you do that.
I don’t usually blog about politics here, but when bad data management and bad people mix, it’s time for a post…
Toronto Star reporter Robyn Doolittle has reported that my world famous (infamous?) mayor Rob Ford, may have lost all the data from his previous election campaign.
Councillor Doug Ford claims the mayor’s former campaign manager, Nick Kouvalis, is refusing to turn over valuable 2010 voter database information.
Kouvalis, who also served for a time as Ford’s chief of staff, is now working for the John Tory campaign. The man who actually ran the database aspect of Rob Ford’s first mayoral campaign says the Fords were given everything right after the election.
“I made two DVDs with all of the data from the campaign — entire voters’ list with contact info, supporters, non-supporters, signs, volunteers, all voter contact records, etc. — and gave them both to Doug Ford,” said Conservative data expert Mitch Wexler.
If it is in fact gone, it would be a serious blow to the mayor’s re-election hopes. Numerous political strategists involved in the 2010 race say what helped set Ford apart was that voter intelligence, much of it collected by Ford himself over his 10 years as a councillor in Etobicoke.
I’ll try to not to comment on the use of the term “voter intelligence.” Just in case you’ve been hiding under a rock (not a crack rock, I presume) our mayor has been in a heap of trouble (NSFW) since he was elected. Actually, even before he was elected. This isn’t a partisan thing when I say I’m not a fan of my mayor. This is all about not respecting his behaviour. But back to the data thing….
Where Rob Ford’s Data Management Went Wrong
Well, pretty much every single thing he has done has been wrong. At least it feels that way. And sounds and views that way. But if we focus on today’s issue of his reported data loss, I’m thinking he messed up by:
- Giving source data to an external party without a backup. When Ford handed over those record boxes full of 10 years of handwritten notes, he lost his source data. All data deserves protection, even handwritten notes. We in IT sometimes ignore paper data, but we shouldn’t. It’s still data.
- Storing personally identifiable and sensitive data insecurely. I’m betting those file boxes where sitting next to his desk. Sure, his desk is in city hall and I’m betting they have decent physical security. But file boxes aren’t exactly locked cabinets. They also have a way of getting disposed of incorrectly.
- Outsourcing data and database management without getting copies of data on a regular basis. It’s sort of crazy to hand over critical data to a third party for management and not insisting that you get copies of it on a regular basis. Even if your relationship is strong, people leave companies or they stop working for you (as we see in Rob Ford’s case). Have you been getting data, models, code, documents from your vendors on a regular basis? You should.
- Using data collected for a specific reason for another reason. Allegedly this data was collected by Ford in fulfilling his duties as city councillor. I’m not sure whether that means it can be used for fundraising and vote elicitation. Sounds off to me. I wonder if all those people who called Ford asking for help with their trash collection and dead raccoon needs knew they were being added to a campaign database.
- Waiting until he needed the data to ask for it. It appears that the Ford brothers waited until it was time to campaign to play “who has the data”. It would be entirely possible (maybe even legally or ethically required) for the outsourcer to destroy all copies of the data when their work ended and the data was given back to Ford.
- Getting copies of data and losing them. It’s reported that the data was provided to Rob Ford’s brother, Councillor Doug Ford. But it appears he lost the data. That’s not good. Where are those DVDs now? Again, this indicates that private and sensitive data probably wasn’t treated with the respect it deserves.
As data professionals, I believe it’s our job to ensure that all data is properly managed and protected. That means monitoring paper and digital data, ensuring that good data management practices are followed, and ensuring that these practices are followed even when we outsource these activities. Please, go find out if anyone in your organization is doing a better job than Rob Ford is. You might be shocked at what you find.
Subscribe via E-mail
- March 2014 (4)
- February 2014 (5)
- January 2014 (4)
- December 2013 (3)
- November 2013 (1)
- October 2013 (3)
- September 2013 (3)
- August 2013 (2)
- July 2013 (4)
- June 2013 (5)
- May 2013 (7)
- April 2013 (5)
- March 2013 (4)
- February 2013 (7)
- January 2013 (12)
- December 2012 (2)
- November 2012 (3)
- October 2012 (3)
- September 2012 (13)
- August 2012 (5)
- July 2012 (17)
- June 2012 (2)
- May 2012 (4)
- April 2012 (4)
- March 2012 (8)
- February 2012 (11)
- January 2012 (3)
- December 2011 (10)
- November 2011 (8)
- October 2011 (5)
- September 2011 (3)
- August 2011 (9)
- July 2011 (5)
- June 2011 (5)
- May 2011 (5)
- April 2011 (9)
- March 2011 (4)
- February 2011 (9)
- January 2011 (8)
- December 2010 (15)
- November 2010 (27)
- September 2010 (2)
- August 2010 (1)
- July 2010 (4)