Data Management Infrastructure

I hate coining buzzwords.  And maybe I didn’t even coin this one.  But we need some phrase to describe the following problem:

Data now comes in two processable flavors: structured and unstructured.  And stacks now exist for processing either flavor.  But each world is undergoing transformation.  And how the two worlds will be combined is up in the air.

Unstructured data: whether or not the Hadoop ecosystem is The Answer, there is vigorous experimentation with how to work with massive amounts and velocities of unstructured data, and there are some emerging norms.  Other NoSQL approaches remain as alternatives, and there will probably be use cases for almost all of them.  We are not even near the end of the beginning when it comes to defining how unstructured data systems will interface with applications (where is the NoSQL SQL?), and we are IMHO still at the very beginning of understanding what storage systems are optimized for these workloads.

Structured data: with NewSQL databases, it is clear how to interface them with applications but far from clear how they work with storage systems, particularly SSD-based storage systems.  Jury is out as well on how to multiplex the different databases in a use case.

I call of this “data management infrastructure”, and it seems to me like an emerging big design problem.

Thoughts?  Who’s working on this?  Where should we invest?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s