Home Page    
MWS    
Subscription    
Collections    
Illustration Colln Free
 
Analyses    
SOA    
Consulting    
About MWS    
Books    

   
   
Privacy Policy    

MIDDLEWARESPECTRA
Your independent resource on business integration and network computing through middleware and message brokering

Middleware,
replication and
the data warehouse

Barry Devlin
Consultant
IBM Ireland


This is an abstract of an analysis that was first published in   MIDDLEWARESPECTRA.
You can order a complete version of this analysis on line by clicking the order button above.


Management introduction

Today data warehousing is recognised as key to the strategic use of data which already exists in an organization. In consequence, almost every company seems to be investigating the construction of a data warehouse. Unfortunately, with this popularity comes an ever broader set of definitions of what:

  • a data warehouse should be
  • are the technologies needed to build it.

In this analysis Dr. Barry Devlin (who published the seminal article on data warehousing back in 1988) restates the rationale for, and architecture of, a data warehouse. He proceedes to describe one key area of functionality required if organizations are successfully to implement their data warehouses. This function, replication middleware in its fullest sense, is needed in order to populate data warehouses in a controlled, automated and managed way. Understanding the implications both for the warehouse and for middleware selection is a prerequisite before purchase..

Dr. Devlin was responsible for the definition of IBM Europe's own internal Data Warehouse architecture in the mid-1980s. Since then, he has been closely involved in other warehouse implementations both within IBM and as a consultant to numerous clients in both Europe and the USA. Dr. Devlin is, therefore, more than qualified to describe the importance, requirements and role of replication as enabling middleware in the data warehouse.

Why build a data warehouse?

The data warehouse is not a new concept. However, only in the mid-1990s is it becoming recognised as one of the ways to bring strategic success to businesses in a rapidly changing world.

In concept the data warehouse dates back to the mid-1980s when a number of larger companies began to use the term internally to describe a new approach to the provision of management information to end users. For example it was in a 1988 issue of the IBM Systems Journal ( Vol 27, No 1, 1988: An architecture for a business and information system) that I described IBM Europe's own approach to its data warehouse.

Data warehousing springs from the combination of two sets of needs which, taken together, allow a new insight into various underlying information problems. The first requirement is for a company-wide view of data. This is driven by business factors, including:

  • the search for new ways of competing
  • changes in management structures
  • the increasing automation of marketing
  • the need to improve the productivity of end-users.

The second requirement is that information systems (and IS people) need to find better ways to provide quality information to their business. Problems commonly encountered in IS today include:

  • ever-expanding arrays of extract programs
  • multiple paths by which the same or related data is delivered to each end user
  • applications in each business area which define data only in terms of its own (rather than broader corporate) requirements
  • the relentless but unmanaged increase in data volumes.

The difficulty is that any one of these needs, if addressed in isolation, produce simplistic solutions. All too often the result is based on allowing direct user access to any data (generally unacceptable for a host of reasons) or generating equally unacceptable complex, technology-implementation projects.

However, by combining the two sets of needs, experience shows it is possible to address both needs. The data warehouse can provide a solution to the business need which addresses the needs of IS and vice versa. It is from this juxtaposition of user and IS requirements that the basic shape of a data warehouse is derived.


Figure 25: The three layer data architecture
Figure 26: Functional components if data replication
Figure 27: Functional components of data replication
Figure 28: Capture and apply methods


Management conclusion

The concept of data warehousing is not new. Many large organisations (including IBM for itself) have successfully built their own infrastructure to implement data warehouses in the past five years.

The key to widespread acceptance and use of data warehousing is the availability of middleware infrastructure products that can automate the process of building and maintaining the warehouse. Of these, one of the more important (if unsung) infrastructure elements required is a comprehensive, automated set of replication tools for moving data into (and within) the data warehouse.

The functionality described in this analysis is becoming more widely available in software tools, although there are still many gaps. For example the level of integration and inter-operability between tools still leaves a lot to be desired.

However, with the ever growing acceptance of the three-layer data architecture model for the data warehouse:

  • vendors now have a common target at which to aim
  • IS departments, and their customers, have a common architecture base to use for evaluation of replication middleware approaches and products.

This truly represents progress. Moreover it is progress for both IS and the end user, from which both can expect to benefit.


This is an abstract of an analysis that was first published in   MIDDLEWARESPECTRA.
You can order a complete version of this analysis on line by clicking the order button above.

 

Spectrum Reports Ltd.
19 St. Michael's Road, Winchester
SO23 9JE, United Kingdom
Tel (+44) 1962 878333
Fax (+44) 1962 878333

email: spectrum@middlewarespectra.com

© Spectrum Reports Ltd. 1987 - 2006