The objective of this article is to provide the reader with a better understanding of and insight into how Semantic Web technologies can be used in the Financial Services Industry to implement Data Federation. Data Federation is the combining of data from multiple transactional and OLTP systems for reporting and analytics. The Semantic Web is an alternative approach to addressing the problem of How to access data across multiple business silos and to aggregate the data into consistent and accessible information. The accepted way to do this is to create a series of Extract Transform and Load (ETL) capabilities (this is often now accomplished through the use of commercially available applications) from each of the unique business silos and then map the data to a canonical data warehouse data structure. The data warehouse is a synthesis of the data from the various transactional / source systems into a well defined reporting-centric data structure. This reporting and analytic data warehouse then becomes the source of information for regulatory and compliance reporting and the source of data for more sophisticated analytic applications (e.g., SAS, COGNOS, and Business Objects). Thus semantic technologies can prove a cost-effective alternative to meeting the informational needs of an organization.
There are, however several design and operational challenges that have to be overcome to create the data warehouse. In order to map the data from the source systems, since each may have the data in a different format, structure or at different levels of precision, it is necessary and required to perform a rigorous business analysis to understand the business meaning (the semantics) and syntactical (Format and Structure how each is represented, e.g. numeric, character, number of digits, etc.) of the data elements within each of the silos. The next step is to aggregate the data across each of the silos to create the aggregated data warehouse, by organizing the Data Elements (creating the taxonomy) and creating the accepted and approved the definition of the data (the ontology) for the business domain.
The semantic web provides the tools and methods needed to create the taxonomy and ontology as a process separate from, the ETL process. The semantic web paradigm facilitates the creation of a consistent mapping from each of the silos to the canonical model and provides the capability to manage them as a separate process. Thus permitting the separation of the business architecture from the technical architecture. This facilitates the management of data meaning and transforms across each of those silos based on the metadata rather than on the actual occurrence values. This saves time, reduces costs and makes the process easier to maintain.
Complex Data Warehouse
Example
In this example, we have used the creation of a set of business requirements for performing a 360-degree view of a banking customer. In this example there are two (2) definitions that have to be agreed upon before undertaking this effort:
- Defining what a customer is. While this may seem obvious it, in fact, has a very complex meaning in banking. To be a customer, you have to pass a rigorous and legally defined set of know your customer criteria and secondly does it mean the individual or the polity represented by the Household (the individual and his/her immediate family) plus the issues of Joint Accounts, etc. It is a very complex set of relationships that can only be addressed via the use of a well-defined
- The definition of what a 360-degree view of the Customer is and what are the associated components and metrics. Once again a well-defined structure and definition of the data (the Taxonomy and Ontology) make this a much more manageable process. They serve as the authoritative dictionary of meaning and structure.
The data required to create the 360-degree view of the customer, in many organizations, has to be gleaned by extracting data from multiple transactional systems. For example, in a bank, it would be from DDA (Checking), Time Deposit (Savings), and perhaps Brokerage relationships. It may also include one or more Loan systems (Secured, Car, Mortgage, etc.). To achieve a 360-degree view of the Customer and to understand the customer’s behavior with the bank would require extracting transactions from each of the separate systems and then aggregating them into a data warehouse. In many instances, this information would be enhanced with psycho-demographic information from external sources (e.g., knowing what neighborhood the customer live in the type of car he drove, their income, etc.). The enrichment of the data, the psycho-demographic profile, when combined with the transactional information results in a better understanding the customer’s interactions and behaviors. From this better understanding it is possible to determine how well the Bank has penetrated the Customers Wallet, and Penetration of Wallet, the number of Bank / Financial products the Customer is using and therefore understand how well the Banks interaction with the Customer is and since we have the transaction history understand how profitable the relationship is.
The challenge; creating the 360-degree view of the customer is that data has to be extracted from, and transformed from separate systems that may have inconsistent syntax and semantics for the same data elements. There is the further challenge of determining which account(s) belongs to which household and perhaps more difficult, what is the total Household (remember our example above) behavior. Householding is the ability to understand the behaviors of all related members of that particular Household that reside at the same location or should be considered as a single economic entity. This is necessary because you must provide husbands and wives with the similar offerings and value. The traditional way of addressing this was to analyze each of the Silos in detail, element by element; correlating the data elements across all of the silos to determine which will be required by the data warehouse, and then element by element, silo by silo determine their agreed to and definitive definition and representation (Semantic and Syntax). Once this process is completed, you would have to determine again on an element by element, silo by silo basis what the Transforms that have to be performed to map the data to the Data warehouse. This is a very time consuming, potentially error-prone and expensive task.
There are three (3) additional areas of consideration when using this approach:
- Maintaining the mappings and keeping up with changes as the siloed systems are modified and when the Data warehouse has to expand to include new data elements.
- How do you store all of the mapping and definitional information that you have done and store it in a way that is flexible and easy to use. There are several tools on the market to do this; we have in the past used OWL.
- Timing, this is perhaps the most interesting of the issues. A Data warehouse by its design represents a Snap Shot of the transactional world. The ETL process by its very definition is run based on an event, in many cases Time. Therefore there will always be a difference in values and content between the Data warehouse and the transactional systems. You can make the time interval small, take the snapshot every (day, hour, ) but it cannot be real time.
Semantic web Alternative
The semantic web alternative addresses the problem differently. It is based on the management of Meta Data, (Taxonomies and Ontology’s) as a separate domain. One creates an inventory of application and Meta Data followed by the establishment of a Sparkle / Spyder End Point (see Diagram 2). This is an alternative to the creation of a Data warehouse. The end user would create a Query that would propagate across the silos, the query would be resolved at each endpoint, and he requested data would be returned. The query is resolved in and satisfied in very near real time. The process, since it is based on the establishment of a MetaData Repository also has the advantage of creating the taxonomy and the ontology; this dramatically facilitates cross-silo understanding and assists in resolving enterprise definitional challenges. A common understanding is a major accelerator in the establishment of a consistent business strategy and the transition of Data to Information and Knowledge.
Semantic Web-Based Federation
There appears to be a material advantage to the adoption of semantic web technologies in the right circumstance. The challenge for the early adopters is to select the correct domain and business opportunity to serve as the initial effort. It has been our experience that the selection of a prototype project in the right business domain with the appropriate set of objectives is the right place to start. It has been our experience that the potential benefits can be material and more then justifies the paradigm shift in Process, Technology, and People.
Summary
The use of semantic web technology can prove to be a valuable architectural approach to solving some of the more interesting and challenging issues encountered in aggregating data across multiple heterogeneous data silos. It does, however, require a material change in your operational and design paradigms; changes that are, based on best practices, necessary and desirable. It is now standard practice to create a cross-silo Taxonomy, Ontology, and Meta Data Management process. Our recommendation is to use the creation of the eta Data Management process and Taxonomies and Ontology’s irrespective of which architectural alternative you select.
Want to learn more? The author, Phil Teplitzky and colleague Harry Hanlet of HP Squared LLC, are available to discuss the advantages and challenges of each approach. You can contact them here.