Wednesday, February 6, 2013

Introducing Data Fabric Design for Commodity SQL Databases

by Robert Hodges (CEO, Continuent)

Data management is undergoing a revolution. Many businesses now depend on data sets that vastly exceed the capacity of DBMS servers. Applications operate 24x7 in complex cloud environments using small and relatively unreliable VMs. Managers need to act on new information from those systems in real-time. Users want constant and speedy access to their data in locations across the planet. 

It is tempting to think popular SQL databases like MySQL and PostgreSQL have no place in this new world. They manage small quantities of data, lack scalability features like parallel query, and have weak availability models. One reaction is to discard them and adopt alternatives like Cassandra or MongoDB. Yet open source SQL databases have tremendous strengths: simplicity, robust transaction support, lightning fast operation, flexible APIs, and broad communities of users familiar with their operation. The question is how to design SQL systems that can meet the new requirements for data management. 

This article introduces an answer to that question: data fabric design. Data fabrics arrange off-the-shelf DBMS servers so that applications can connect to them as if they were a single database server. Under the covers a data fabric consists of a network of servers linked by specialized connectivity and data replication. Connectivity routes queries transparently from applications to DBMS servers. Replication creates replicas to ensure fault tolerance, distribute data across locations, and move data into and out of other DBMS types. The resulting lattice of servers can handle very large data sets and meet many other requirements as well. 

Data fabric design is a big topic, so I am going to spread the discussion over several articles. This first article provides a definition of data fabric architecture and introduces a set of design patterns to create successful data fabrics. In the follow-on articles I will explore each design pattern in detail. The goal is to make it possible for anyone with a background in database and application construction to design data management systems that operate not only today but far into the future. At the very least you should understand the issues behind building these systems. 

Some readers may see data fabric design as just another reaction to NoSQL. This would be a mistake. Building large systems out of small, reliable parts is a robust engineering approach that derives from ground-breaking work by Jim Grey, Pat Helland, and others dating back to the 1970s. Data fabrics consist of DBMS servers that you can look at and touch, whereas NoSQL systems tend to build storage, replication, and access into a single distributed system. It is an open question which approach is more complex or difficult to use. There are trade-offs and many systems actually require both of them. You can read this article and those that follow it, then make up your own mind about the proper balance.

Read the entire article at

No comments:

Post a Comment