Friday, April 13, 2007

Virtual Content Repositories as Content Integration Approach



In essence, content integration is about providing users within an enterprise with a single point of access to all content they need in their daily work.

There are basically two approaches to content integration – consolidation or federation (point to point integration is basically a swearword today). The consolidation approach, to consolidate all content sources into one single enterprise content repository, must however seem like a utopian dream to most large enterprises. Instead, a common way to provide unified access to heterogeneous content in disperse content sources is to implement an enterprise portal solution with content integration taking place at the presentation level. But this integration approach is in my eyes not "real" content integration since it does not offer the possibility to describe, access and search the content in a unified way. Instead, using a federation approach with a virtual content repository could provide these possibilities “behind the scenes” of the portal solution, with the action taking place in the middleware. The promise of virtual content repositories is just to do that - to provide unified access to disparate content sources within an enterprise without any point-to-point integrations or repository consolidation. Especially the possibility to map metadata between different content sources is essential for content integration. Just as content, metadata is usually stored and managed in isolated islands. As Bruce Silver writes “the user needs to be presented with a unified list of attributes independent of the attribute structure of the underlying systems.”

The major benefits with a virtual content repository approach would be that it is relatively cheap and fast compared to consolidation, and that it will still integrate content so that it can be described, accessed and searched in a unified way. In addition "…virtual repositories can simplify the task of compliance by virtue of containing a single set of business processes applicable to all content in all repositories…//...virtual repositories mean organizations can stop debating whether to go with a single or multiple data stores, and instead concentrate on the critical factors that make for a good repository of any size" (R Dukart)

Major ECM / ECI vendors such as Oracle, IBM, EMC and BEA seem to believe in virtual content repositories for the federation approach, with content being federated to a single virtual repository from any existing content source via a standardized API. Obviously, the key for virtual content repositories to succeed is the use of standardised API:s to access the repository and underlying content sources. JSR 170, the Content Repository API for Java Technology specification developed by Day Software, was the first adopted content repository API standard. The goal with the standard was to “produce a content repository API that provides an implementation independent way to access content bi-directionally on a granular level.” (Day Software). Hence, repositories supporting this standard can be accessed in the same way and the repositories are not tied to any one application. The latest version of the JSR 170 standard, JSR 283, was released in October 2005 by Day Software which leads the specification (and also formed a strategic technology partnership with Oracle in November 2006).

Although still being in an early adoption phase, I believe that virtual content repositories have a future as a content integration approach. Maybe not for "deeper" integration of data in relational databases, but certainly for integrating content such as Office documents, web pages, graphics and e-mails.