The Transient Data Imperative

Distributed computing applications often rely on transient, in-flight data moving thru the system to automate important business processes. Unlike traditional system-of-record data used to drive the logic of conventional applications, process flow information exists for the life span of a process and is typically discarded when the process completes.  However, such in-flight data often contains critical information used for risk analysis and problem resolution, representing a more accurate state of the overall system.

trans-data

The ability to store, analyze and process transient data in real-time as it flows through the system is becoming increasingly important within the enterprise. IT organizations are under constant pressure to reduce reliance on expensive, transactional data management systems and simultaneously provide the vital data needed to make accurate, real-time decisions. In response to this challenge, IT data management strategy is evolving to include in-flight process data which frequently out-paces conventional application content in it's relevance. Managing reliable data flow across business processes is becoming a critical aspect of enterprise architecture. And providing visibility into a data flow allows organizations to turn the changing state of data across an enterprise into actionable information, thereby mitigating risk and providing operational transparency.

As demand for real-time analytics drives transaction rates beyond the capability of conventional databases, distributed shared memory is becoming the preferred alternative for storing and processing vast streams of volatile, transient data. A shift in architectural philosophy away from database dependency is evident across the industry from Wall Street to Silicon Valley. Although performance and flexibility are major architectural benefits, the main catalyst for this paradigm shift remains financial. As processor speed increases (in accordance with Moore's Law) and the cost of memory decreases, hybrid data management systems become faster and more economical than a conventional database.

Today, nearly all mission critical systems utilize some combination of hybrid storage, distributed data caching and messaging to provide real-time analytics and governance of in-flight data. In fact certain regulatory and compliance bodies are starting to mandate this type of capability within the industry in order to guarantee accurate reporting and fiscal transparency.

From Databases to Data Spaces

To overcome the limitations of conventional databases and provide a more flexible solution, enterprise architects are looking to a new abstraction in data management called the data space. Data spaces are not intended as a replacement of conventional database systems but rather a co-existence approach to data management and can be viewed as the next step in evolution of enterprise data architecture and integration.

Conceptually, the primary function of a data space is to simplify integration of heterogeneous data, regardless of it's structure by providing a unified query and control mechanism. For example, semi-structured data such as XML documents and text files would be accessible via the same interface as structured data organized into tables or key/value (tuple) pairs. Unlike conventional database systems that promote centralized management of structured data, data spaces provide co-located data management for application-specific information. The main benefit of such data management systems is data flexibility, which allows developers to address specific networking, performance, and data volume requirements.

The secondary function(s) of a data space is to simplify data integration by providing data mapping and semantic integration facilities for hosted data collections and external data resources such as relational databases or files. In contrast to traditional data integration systems that require schema and relationships between data elements to be defined up-front, integration within the data space can occur gradually over time. This allow users to improve the data management system in an incremental, "pay-as-you-go" fashion as it evolves.

Although some data (such as files) may not be fully under data space control, users may define or infer relationships between all data collections using a common query language and semantics. Integration of disparate data models is gaining broad acceptance in the industry as evidenced by the growing adoption of so-called NoSQL (Not Only SQL) databases that specialize in storing semi-structured data.  As data management evolves to include diverse, interrelated data sources the data space provides a solution for organizing and managing a broad range of information in a standardized and efficient way.

Data Spaces in the Application Fabric

The Service Application Engine™ offers robust facilities for hosting data collections called Application Data Spaces™ based on the data space concepts described above. A Data Space is a scalable, distributed general-purpose data management system capable of storing structured or semi-structured data.

dataspace   

Information within a data space is organized into data collections that group data elements of similar structure into physical entities such as a table, queue, map or file. Data Spaces are a hybrid data storage solution that can manage structured data, objects, document-centric entities such as XML or messages depending on the collection’s data model.

Developers may use memory in a flexible fashion according to application needs, choosing from several usage models. Collections may be configured to reside in memory allowing for fast, low latency data access, may be logged or written to disk.

Users may implement a client-server, peer-to-peer or WAN style topology, choosing an appropriate data architecture as dictated by the application's scale, throughput, latency and reliability requirements.


Data Spaces provide real-time change notifications by allowing users to define Event Triggers on data change operations, thereby allowing any data change events to become a stream source for CEP engines and downstream systems.

A Data Space Management System offers the following benefits:

  • Provides uniform access to transient data resources in an incremental, “pay-as-you-go” fashion.
  • Offers guarantees of consistency and durability similar to those of traditional database systems.
  • Enables users to personalise, partition and share data based on preferences and application requirements.
  • Allows different qualities of data management at different costs, trading disk for memory as needed.
  • Provides a choice of data collection models and structure (ie. queue, table, map or file) based on application needs.
  • Offers a choice of memory usage by data collections based on application needs.
  • Facilitates the streaming of data and it's changes between data spaces and other system participants (clients).
  • Bridges the developer gap by presenting data simultaneously as objects and query-able data structures where possible.
  • Supports a variety of data access methods including query broadcasting, asynchronous reply and HTTP protocol