Browsing articles from "February, 2012"

Musings on Data, Analytics and Guessing

Feb 29, 2012   //   by Karen Lopez   //   Blog, Data, Data Modeling  //  2 Comments

image

 

It is a capital mistake to theorize before one has data.
-Sir Arthur Conan Doyle

Experts often possess more data than judgment.
-Colin Powell

Sir Doyle and General Powell seem to have conflicting points of view about data, but I’m not sure they do. I love my data and yet data alone won’t solve many problems.  I have to figure out which data to use, how current the data needs to be, and how to use that data with other data and my own experiences and biases to get to a decision that’s right for right now.  

We in the data profession pretty much spend our days trying to get quality data to the right people as quickly as we can. We provide analytic services in hopes that management can turn that data into good decisions. We add biases, we filter out biases, we support a lot of guessing.  None that makes Sir Doyle or Gen. Powell wrong.

What if we provide data and analytics to organizations, but mostly management just makes guesses? I recently sat in a meeting where we were asked to keep adjusting the data rules until the analytics would show exactly what values the end users wanted to see.  Of course this is a fine balance: end users need to set the requirements around how data should be processed to produce the analytical solutions, but at some point we data pros can’t get sucked into using decision systems to justify bad decisions.

I’ll leave you with a quote from Mr. Heinlein:

To get anywhere, or even live a long time, a man has to guess, and guess right, over and over again, without enough data for a logical answer.
-Robert A. Heinlein

That might be how a man should live, but organizations need to ensure they are working with good data and great analytics.  At some point your competitors will stop flying by the seat of their pants via SWAGs.  And they will most likely be making better decisions than your organization.  If your analytics are there only to make end users feel better about their guesses, you’re doing it wrong.

A New Era for #NASATweetup: The NASA Fiscal Year 2013 Budget Briefing

Feb 21, 2012   //   by Karen Lopez   //   Blog, Professional Development, Social Networking, Space  //  No Comments

Maj Gen Charles Bolden / CFO Elizabeth Robinson  by Karen Lopez

I was invited to attend a NASA Budget Briefing as part of a recent NASATweetup held at NASA Headquarters on 13 February 2012.  I’ve been to other NASA Tweetups, but this was a new type of event for both attendees and NASA.  First, the topic was more administrative than any others.  No fire or sound waves. No Florida hair.  Heck, one of the people I hadn’t seen for a while said "You look different".  My response: "You’ve never seen me in work clothes". 

The first two NASATweetups I attended were launches (STS-134 and Juno).  Both of these had 150 attendees with a two-day program of speakers and presentations, then a launch.  This meeting was part of an existing event, a media briefing about the 2013 Fiscal Year Budget.  Yes, this was PowerPoint and spreadsheets, for the most part.  However, the content of those presentation materials was going to show us which programs were moving forward and which ones were going to have to change or be dropped completely.  Being a data professional, this was my type of event.  I wanted the data and the budget wasn’t going to be released until one hour before the event.  That’s a fast read of a set of slides and some large documents.  I went for the slides.

The second thing that was different: this tweetup was much smaller. The original registration limited attendees to 20 and I think we had just under that.  The most important difference was that we were going to be part of the media, able to ask questions along with the traditional media.  This is a first for NASATweetups and I’m not sure how many other US Federal media briefings have involved a mix of traditional and social media. I was excited that I could be part of this new approach to media, especially because it brought together two of my passions: space and social media. More on that mixing later.

The NASATweetup

The first thing that was different from other NASATweeups: We received no badges or swag bags…because traditional media don’t get those, either.  If I do one of these again, I’ll bring my own badge or credentials.

In the opening statements, Bob Jacobs announced this new era and took our photo, which was posted to Twitter.

Photo of Media and Social Media Attendees.  By Bob Jacobs

You can see him pause to take the photo in the video below.   I think that was our second sign that this press briefing was going to be different. 

This year, we are trying something a little different. As well as traditional media representatives, for the first time we have invited members of the social media community to be a part of today’s presentation, and we will be taking questions via Twitter using the #AskNASA. So we thank everyone for joining us for today’s presentation.

We will go over some of the ground rules first, but well, wait a second. I want to make sure I capture this. If we are going to be social media, I need to do it from here too.

[Laughter.]

MR. JACOBS: Okay. Got a Photo.

The briefing was done by Major General Charles Bolden, NASA Chief Administrator and Chief Financial Officer Elizabeth Robinson.  It started with a video about NASA’s plans.

NASA Video from FY2013 Budget Briefing

I’ve listed some links in the related section below of the analyses of the impact of the new budget, but the ones that were of note to me:

  • STEM education and outreach was cut from $138 million dollars in 2012 to $100 million.  That’s a significant cutback to this program, but only a tiny portion of a tiny portion of the overall US Federal budget.  This is going to make it more difficult to find and retain qualified people in the future.  I’m also guessing that other organizations are having their STEM budgets cut as well.
  • ExoMars program will need to be re-programmed, meaning that we will not be collaborating with the European agencies for these Mars exploration programs . This has left ESA scrambling to find other countries to help with these programs, most likely Roscosmos.

DataChick’s Question on Open Government and Open Data

I was fortunate to be called upon to ask a question:

Let’s take one more question over here, and then we will take a couple from Twitter, and then we will go to the field centers.

QUESTIONER (Karen Lopez): Hi. I am Karen Lopez. I am Datachick on Twitter.

One of the ways that the public, the rest of us, can benefit from all these NASA missions is via access to open government transparency and open data initiatives, like at data.NASA.gov. Have budget pressures made any changes to those programs? Will they continue to expand?

ADMINISTRATOR BOLDEN: Do you want to take that?

DR. ROBINSON: Okay. So NASA couple things. One is you know the administration has a very vigorous Open Government Initiative, and NASA is a participant in that. And it recently went international, and we have an international event coming up in April April, thank you April, where we will be gathering together folks from around the world, virtually, of course, to work on things. So we have very vigorous programs.

And a large part of what we do in Open Government is, as you said, we leverage off of things that the programs do already, make their data available, make it accessible, Open Government a little bit more just to point them in the right direction. So it’s really Open Government is really a philosophy at NASA that we try to put as much as we can out into the public in the most understandable way possible, and so we are doing that.

The Open Government Initiative has taken us in a few different directions, and we will continue that. We plan to keep going forward, but it is always when you talk about Open Government, it is really it is hard to predict, because we are going to do so much, right? We are going to have so much data coming in and all of that. NASA is a very exciting place to work, because now we have apps on our iPhones from NASA and a whole bunch of things, so we are already out there in terms of Open Government

QUESTIONER (Karen Lopez): [Speaking off mic.]

[Here I followed up with "So no immediate changes?" ]

DR. ROBINSON: Well, not in the near future. We’re going to assess I am looking at my partner here. I am the senior accountable official for Open Government, and then our CIO over there

ADMINISTRATOR BOLDEN: We are both looking at the CIO.

DR. ROBINSON: Yeah, we are both looking at the CIO, and it is her folks mainly who do it. And so I think we are really going to assess up to this international event, how to keep those kind of things going or not.

And with that question I was able to add my third passion: Data.  As in, Love Your Data.  The terms data or information was mentioned 21 times during the briefing, twice in NASA Administrator Charles Bolden’s opening remarks.

This budget supports more than 80 science missions, 56 currently in operation and 28 now under development, that cover the vital data we need to understand our own planet, diverse missions reaching farther into our solar system, and the next generation of observatories peering beyond the reaches of our neighborhood to other galaxies and their solar systems and undiscovered phenomena

The missions currently at Mars the Mars Science Laboratory, on its way, and MAVEN, well into development will provide many years of data to help us understand the Red Planet and our needs in future years to meet the President’s challenge to send humans to Mars in the mid 2030s.

Citizen Journalists?

No, we weren’t. In some of the descriptions of the event, including the announcement of the Tweetup, we were described as "Twitter Fans" of NASA.  One of the issues I can see with trying to mix fans and journalism is that…they shouldn’t mix.  Sure, it’s not unheard of for a journalist to be excited about interviewing someone, but in theory they aren’t supposed to be fans.  I don’t think my role there was as a citizen journalist.  However, I think we Tweetup attendees did a good job not gushing all over Bolden and Robinson in our questions.  In fact, I was impressed by the lack of fanboi attitude in any of our questions.

You can really tell the difference when you see this still taken from This Week at NASA coverage:

Screen capture from NASA TV This Week at NASA

Three laptops, all running Tweetdeck in that photo.  That’s me tweeting in the upper center of the frame. Most of the traditional media attendees brought digital recorders and paper.  So while they were taking notes, we were sharing live.  That’s not necessarily better.  It’s different.  Mixing social media and traditional media can work.  They don’t have to compete.

Some of the traditional media people from major media organizations even retweeted my question and told me afterwards that our questions were good.  I think that means the new era of mixing traditional and social media may continue.  I look forward to future NASATweetups for these types of events.

Overall….

In talking to people after the event I think this experiment was a success.  The Tweetup crowd came up with some great questions, as did the Twitterverse via the #AskNASA hashtag.  I am happy that I was selected to be part of this new era of social media, NASA…and Data.

NASATweetup Video from C-SPAN

The entire event was just over an hour.  You can watch the whole thing via this C-SPAN feed.

 

Or if you prefer the shorter briefing of the briefing, you can watch the one minute version on TW@N at the very beginning of this video.

 

This Week at NASA

 

How Safe is Your Medical Data? You Don’t Want to Know…

Feb 18, 2012   //   by Karen Lopez   //   Blog, Data, Data Breach  //  1 Comment

So you live in a country that has legislation requiring your health data to be protected and you believe it’s all safe.  If you live in the US, think again.

SNAGHTML10cd2c55

According to a study by Ponemon Institute sponsored by MegaPath:

  • 91% of small healthcare organizations (think your local doctor, dentist, optometrist or clinic) had experienced a breach of protected health information (PHI) in the previous 12 months; of those, 29% resulted in medical identity theft
  • 52% of small healthcare providers rated their security technology plans as ineffective
  • 43% had experienced medical identity theft in their organizations
  • 55% of respondents had to notify patients of a data breach in the previous 12 months
  • On average, less than 10% of the respondents’ IT budgets are spent on security

You can register and download the entire paper at http://www.megapath.com/solutions/industry/healthcare/study/

I found this table the most interesting discouraging:

image

From a data governance and data protection point of view, I’d really expect to see ALL of those be 100%.   My doctor recently moved to mostly electronic health records (as have most in my province), but I’m wondering what his answers to all of these questions would be.  When I think about the 91% data breach numbers, I see this table as one of the key reasons that number is so high.

Even if you aren’t in a health-related organization, I’d expect your numbers to be higher.  63% backup and disaster recovery plans? How can we call ourselves professionals when this is life-critical information?  Ultimately it is organization leadership who are responsible for protecting data.  But I’ve always been concerned about how far we data professionals should go in ensuring that the public is protected from harm when data polices and practices are not sufficient.  Should we not move to other projects? Report bad practices?  To whom?

This is a US-based study and I’m curious about similar numbers in other countries with and without health data privacy legislation.  If you have links to other sources, please provide them in the comments.

New in SQL Server 2012 SEQUENCEs: Why They Aren’t Just for Surrogate Keys

Feb 16, 2012   //   by Karen Lopez   //   Blog, Data, Data Modeling, Database, SQL Server  //  3 Comments

imageStarting with SQL Server 2012, database designers will have the ability to use SEQUENCEs to generate numbers, such as for populating surrogate keys (primary Keys with no meaning). Sounds like the IDENTITY property, right?  It’s not quite the same.  In SQL Server (and other DBMSs), the IDENTITY property is a characteristic of a column in a table.  A database designer can set this property up to automatically generate a value in that column and specify certain features such as what number to start with and what number to increment by.  These properties are set for a specific column and only one column in a table can have this identity property.  That’s why IDENTITY is almost always used as a surrogate primary key on a table.

To create a column with the IDENTITY property in SQL Server 2012:

CREATE TABLE 
    [ database_name . [ schema_name ] . | schema_name . ] table_name 
    [ AS FileTable ]
    ( { <column_definition> | <computed_column_definition> 
        | <column_set_definition> | [ <table_constraint> ] [ ,...n ] } )
    [ ON { partition_scheme_name ( partition_column_name ) | filegroup 
        | "default" } ] 
    [ { TEXTIMAGE_ON { filegroup | "default" } ] 
    [ FILESTREAM_ON { partition_scheme_name | filegroup 
        | "default" } ]
    [ WITH ( <table_option> [ ,...n ] ) ]
[ ; ]
 
<column_definition> ::=
column_name <data_type>
    [ FILESTREAM ]
    [ COLLATE collation_name ] 
    [ NULL | NOT NULL ]
    [ 
        [ CONSTRAINT constraint_name ] DEFAULT constant_expression ] 
      | [ IDENTITY [ ( seed ,increment ) ] [ NOT FOR REPLICATION ] 
    ]
    [ ROWGUIDCOL ] [ <column_constraint> [ ...n ] ] 
    [ SPARSE ] 

http://msdn.microsoft.com/en-us/library/ms174979(v=sql.110).aspx

I’ve highlighted the syntax that applies the IDENTITY to a column.  SQL Server will know which column has the identity property (and you can only have one of these columns in the same table).  Notice that there are only three options for an IDENTITY.

To create a SQUENCE in SQL Server 2012:

CREATE SEQUENCE [schema_name . ] sequence_name
    [ AS [ built_in_integer_type | user-defined_integer_type ] ]
    [ START WITH <constant> ]
    [ INCREMENT BY <constant> ]
    [ { MINVALUE [ <constant> ] } | { NO MINVALUE } ]
    [ { MAXVALUE [ <constant> ] } | { NO MAXVALUE } ]
    [ CYCLE | { NO CYCLE } ]
    [ { CACHE [ <constant> ] } | { NO CACHE } ]
    [ ; ]

http://msdn.microsoft.com/en-us/library/ff878091(v=sql.110).aspx

Notice that there is no TABLE or COLUMN referenced in that TSQL.  You are just creating a SEQUENCE.  SQL Server will have no idea what you are going to use it for or where it will be used.  In fact, you could create a SEQUENCE and never use it. Notice there are several more options for a SEQUENCE.

SEQUENCEs are separate objects in a database and can be used to populate more than one column even across several tables.  Because they aren’t bound to a table in any way, they can be used in ways that IDENTITY columns can’t be used.  For  instance:

  • One SEQUENCE can be used across more than one table. For instance, a Control Number or Document Number on several tables.  I’ve seen this used in other DBMSs to generate document numbers across subtypes, say Purchase Orders, Shipping Notices, Return to Vendor Numbers, etc.  There’s a requirement that they be unique across all types of documents.  This could not be done with IDENTITY.
  • Using a SEQUENCE allows tables to have more than one auto-generated number in multiple columns. Think of Order Number and Cancellation Number, both on the Order table. You can’t do that with IDENTITY.
  • Since sequences are generated completely outside a table, you can chose to store the resulting value in a different format, say VARCHAR or CHAR. Or, heaven forbid, you might want to append data to it or do other kinds of evil to it before storing it.
  • Since sequences are generated completely outside a table, you can store the results in a NULLable column.

I think the confusion sets in because we tend to think of auto-generated number as useful only as surrogate keys. But we have real data needs for generating numbers other than just row identifiers, even if that’s the primary use for this. In fact, one could have a table with a PK that uses IDENTITY and an unlimited number of columns that use a SEQUENCE to set their values.  Of course, SEQUENCEs can be used for surrogate primary keys, too.  The existence of SEQUENCEs gives us another option over IDENTITY.

So SEQUENCE is a feature that just generates numbers based on some characteristics. A designer has many ways to use the number once it is generated and tables are not dependent on that method to get that number, nor do they even “know” about how that value was provided.

This separation of the COLUMN from how the data is populated is a feature of using SEQUENCEs. How one uses this separation depends on the technical and business requirements for the data.

Discovery versus Exploration

Feb 14, 2012   //   by Karen Lopez   //   Blog, Space  //  No Comments
Librarian of Congress Daniel Boorstin

Image via Wikipedia

The American experience stirred mankind from discovery to exploration. From the cautious quest for what they knew (or thought they knew) was out there, into an enthusiastic reaching to the unknown. These are two substantially different kinds of human enterprise.
-Daniel J. Boorstin

Which kind should we be?

Use #askNASA Hashtag to Send Your Questions About the NASA FY2013 Budget

Feb 13, 2012   //   by Karen Lopez   //   Blog, Space  //  No Comments

Photo by Bob Jacobs (@bnjacobs)

On Monday, 13 February I’ll be part of another NASATweetup, this one at NASA Headquarters.  Administrator Charlie Bolden will hold a briefing on the 2013 NASA Budget.  There have been many reports that the 2013 budget will remain about the same as it was in prior budgets.  However, this means that NASA will most likely have to pull out of agreements with other space agencies such as the European Space Agency (ESA) on collaborative efforts for future MARS missions.   

I believe this is the first time that NASATweetup attendees will be attending a formal briefing and the first time we will be able to ask questions.  In addition, NASA will be taking questions via Twitter from tweets using the #askNASA hashtag.   My interest will most likely focus on the impact on NASA’s successful open government (http://open.nasa.gov ) and open data ( http://data.nasa.gov ) programs.  I’ll also be interested in hearing what these budget restrictions mean to ongoing collaboration with other space agencies such as the Canadian Space Agency, Roscosmos, JAXA and ESA.

You can watch the budget briefing live at NASA TV on Monday, 13 February at 2 PM EST.  This is available in many formats; make sure you take advantage of the formats offered for your device. 

NASA prepared a video last year about their quest to win the future.  It looks like NASA will be scaling back on those plans for 2013.  

NASA 2012 vision – STEM

Briefing photo by Bob Jacobs

Resources:

Pages:12»

Subscribe via E-mail

Use the link below to receive posts via e-mail. Unsubscribe at any time. Subscribe to blog.infoadvisors.com by Email


Facebook Flickr foursquare Google+ LinkedIn Skype StumbleUpon Twitter YouTube

Categories

Archive

UA-356944-2