Monday, February 13, 2012

Choosing the Right Data Repository

All you mental freaks out there likely saw "repository" and thought "suppository", but that's ok.  Settle down.  It's not the weekend yet.

This article cooked up in my thimble-sized brain after a string of events at my office.  The issues all circled around having islands of data sitting around in various places, inside of various "containers" and in various formats and structures.  The data in one particular case involves inventory.  But inventory is a big, fat, drippy, soggy bag of goo that has different meanings and different values to a lot of people circled around it like kids at a campfire.  Some want numbers.  Some want names and labels.  Some want locations.  Some want state values (indicators of current status or changes of state).  Some want derivatives, like contract data, support metrics, and other secondary and tertiary cost drivers.

If you put a microphone up to each of these people and ask them where all this data should reside, you will very likely hear a wide variety of answers:
  • Spreadsheets
  • Database Tables
  • XML files
  • Visio Drawings
  • AutoCAD Drawings
  • Compound Documents (Word documents with embedded spreadsheets, etc.)
  • Web-enabled reports (web services providers)
  • Paper print-outs
And the list goes on and on.

So... who is correct?

As it turns out, when you're in the midst of sifting through what bucket fits the most needs of the most people, it's difficult to see the end of the path in the fog of new requirements.  Until you've met with everyone involved, and discussed what they (a) think they need and (b) what they REALLY need, it's near impossible to avoid choosing the wrong container.  I've been in this business for over twenty-five years and I've seen the smartest MBA, PhD, and senior brains screw it up.  It's hard not to.  The reasons are a-plenty...
  1. You can't predict future needs
  2. You can't predict future project priorties
  3. Sizing the storage requirements is challenging
  4. Placing the storage provider is challenging
  5. Avoiding proprietary "lock-in" can be challenging
In general, here's what I usually try to follow...
  1. Strive for version independence.  Avoid storing in a product where version changes often cause upgrade issues for consumer applications and services.  ADO and ADO.NET (ODBC, OLEDB) are fairly version-agnostic.  Meaning that if you move the tables from SQL 2008 R2 to SQL 2012 there's much less chance of having to recode applications to change the connection endpoint compared with storing in Microsoft Access, Excel or even AutoCAD drawings.
  2. Find a "platform-neutral" provider medium.  Client applications are always off the table unless there isn't a choice.  It has to be a "server" application or service.  It has to provide robust access to everyone that needs it, with the least overhead, and the least performance drag.  If every consumer of the data has to have a proprietary application installed in order to access it, you're taking a risky, and often costly direction.  This is why databases are most often preferred, since they offer up access in the most neutral ways and the widest range of applications and uses.
  3. Take your time!  Don't let anxious users pressure you into making a quick decision rather than a correct decision.  Just because the floor plan folks want everything in AutoCAD in time to meet their looming deadline doesn't mean that a month later, five times as many users will scream for access to the data via a web services provider medium, or an XML table.
  4. Try to leverage what you already have.  If you have Oracle, use it.  If you have SQL Server, use it.  If you have an intranet server with web services enabled, use it.  Try to not build new bridges when others exist unless you can't possibly make do with what you have. Reinventing the wheel is dumb.
  5. Enlist an objective team.   Gather folks from your database group, your applications development group, management group, finance group, any group that might even be remotely impacted by what you're trying to accomplish.  You'd be amazed what you can learn from people you didn't think had any interest or involvement in what you'r doing.  Many times they've been down the same road before and can offer valuable insight and advice.  Take it!  Learn from it.  It's ok to meet with the direct consumers, but if you surround yourself with only them, you will be going forward semi-blind.
  6. Get metrics!  Don't ever base a purchase decision or a project direction on input from test users that don't back it up with hard numbers.  It's not that you can't trust them (sometimes you can't), but it's that without numbers, there's no way to objectively compare and analysis what's going on.  For example:  I was involved with a project to establish a central data repository for a variety of applications over assorted quality WAN links.  Three of the locations complained about performance problems and pushed HARD to go with another alternative.  It turned out to be a network link configuration problem at the router, and once it was addressed, the performance was perfectly fine.  Had this not been investigated, the alternative project direction would have delayed the entire project for four more months and added 50% more cost.  Metrics are king.
Even if the majority of the consumer-end users are clamoring for going with a platform that fits squarely within their operational environment, try to avoid a knee-jerk response.  I've seen too many projects that followed that direction and everyone realized later that they should have started with something more generic or "external" to their own needs.  The reasons are often tied to extending logical links to other data sources to leverage more powerful automation uses.  

For example, the sales department chose Excel spreadsheets to store all their static tables of rates and tiers for various services and products.  The engineering team would like to link to that in order to improve material and cost estimation, but the spreadsheets are on a server that doesn't allow access to the engineering department.  Furthermore, when they decide to move their spreadsheets to a SharePoint intranet, they realize the format and layout of the data is difficult to use for external linking.  Had they chosen an XML file posted on an intranet server, or used a robust Database host, the data would have been format and layout agnostic and more easily accessible to users without distributed and varied layers of security (NTFS, shares, groups, application-specific formatting, etc.)

All I'm saying is that you need to be cautious and take your time making a decision on where to store data that will be used by a lot of different roles throughout your organization.  It's kind of like placing a watering hole in a vast desert.  Those that are too far will suffer before they can reach it.  Those that show up with buckets that won't fit into the Well are going to suffer too.  Choose the location and configuration wisely.

No comments: