A single, consistent view of business information with assurance that it is accurate and trustworthy is imperative. But data is dynamic and disparate for most organizations, and traditional data warehousing techniques alone often do not address the broad spectrum of information access and analysis requirements. To meet this challenge, a strategy for enterprise information management is required. EIM provides a business and technology framework to support the delivery of information into a single, consistent view. A comprehensive EIM strategy helps organizations improve operational efficiency and bottomline performance and serves to broaden the effectiveness and reach of BI solutions.
Challenge: Silos of Data and Metadata
BI infrastructures are often fragmented with data and metadata residing in different business domains, departments, and geographies. This presents a challenge when it comes time to deliver a single, consistent view of the business. A few common issues contribute to this problem.
1. Disparate Data
Organizations face an explosion of data today, with information coming from traditional sources (such as spreadsheets, databases, legacy systems, and enterprise applications) and new sources (such as Web applications and XML-based systems.) As organizations move through mergers and acquisitions, new data sources are introduced into the IT landscape. The one constant about enterprise data is that it’s always changing. As a result, IT must continually be prepared to deal with disparate data that comes from heterogeneous data sources.
Many enterprise data warehouse implementations fail to meet their business objectives because of the rapidly changing landscape of enterprise data affected by organizational mergers and acquisitions. How do organizations gain a single view of business when new data enters the BI landscape before they’re done building their enterprise data warehouses?
Additionally, user requirements for on-demand information have increased. In many cases, users need near-real-time information to address operational BI requirements.
2. Poor Data Quality
What is the impact to your business when you make decisions from inaccurate information? In the case of a midsize software company in the UK, it cost over $1 million when a budgeting decision based on an inaccurate BI report caused inventory requirements to be overstated. Data-quality issues are a reality in almost all transaction systems. They are caused by different factors, such as incorrect data entry, multiple records of the same customer coming from different systems, empty fields (such as missing contact information), and redundant or inconsistent data between two data silos.
For most organizations, poor data quality is the primary reason why BI projects fail. In the case of the software company, the BI project suffered from a lack of end-user trust until the company successfully implemented a data integration strategy that fixed its data-quality issues. Many organizations that have experienced this pain recognize the importance of data quality for BI success.
3. Inconsistent Semantics
How do you determine the total sales for your company if every division, department, and country uses a different definition of sales? In the case of a global media company, 15 different terms described sales in its operational systems, departments, and geographies. Without a common language and definitions for data across the organization, it is impossible to gain a single, consistent view of the business. A process must be established to help define and reconcile semantics across the organization—with minimal impact on current systems.
4. Metadata Visibility
How can you trust the numbers in your BI report if you don’t know where they came from or how they were computed? Compliance requirements mandate that organizations be held accountable for their financial information. As a result, the need to trace data to its origin is now a critical BI function.
To answer this question, you will need to gain visibility into all the metadata in your BI environment. The challenge is that every data source, BI tool, and ETL tool contains its own metadata, and they do not talk to one another. Answering the question can also be a consuming task and, in some situations, nearly impossible because data has been transformed using hand-coded scripts. For one data warehousing expert, 50 percent of BI service requests are about data lineage: “Where did that number come from?” Without a way to easily view the end-to-end metadata in a BI environment, organizations cannot deliver trusted information for BI users.
Solution: Gaining a Single and Consistent View of the Enterprise
Getting to a single view of the enterprise has been a long-standing goal of most organizations, but it has been achieved by only a few. Managing information across the enterprise requires a well-thought-out strategy that employs a set of technologies and processes to address the different requirements of the business. Ultimately, your enterprise information management strategy needs to address the core issues that arise from having silos of data and metadata. The key areas to focus on are data integration, data quality, semantic reconciliation, and metadata management.
1. Data Integration
Data integration is the foundation of successful BI. Without a comprehensive strategy for unifying your disparate data, you will not gain a single view of the truth. Various methods exist to integrate disparate data, and each offers a unique advantage that meets the different information requirements of the business.
Extract, transform, and load (ETL) technology is used to build data warehouses and data marts for BI. ETL extracts data from disparate source systems, transforms the data to meet business requirements, and loads the data into a target database. The process usually occurs in a nightly (batch) window. While organizations use ETL tools to build an enterprise data warehouse to deliver a single version of the truth, this is not often achieved because of evolving business requirements and a rapidly changing data environment. ETL allows organizations to:
- Create a trustworthy data foundation for analytical purposes
- Combine data from disparate data sources
- Establish consistency throughout the organization
- Provide historical breadth and enable trend analysis
Enterprise integration information (EII) technology has emerged to provide agility for organizations to meet real-time information requirements. EII is both a complementary and, in some cases, alternative solution for ETL. EII provides real-time integration of disparate data without physically moving it to a new location. Only requested data from the transactional systems is moved and transformed, on demand at query time, and the end result appears to come from a single data source, similar to a data warehouse. Because there is no storage of data, EII does not address the need for a historical view of the business. EII allows organizations to:
- Provide real-time views of data spread across multiple operational systems
- Combine data from an operational system with a data warehouse
- Support operational BI requirements
Enterprise application integration (EAI) allows enterprise application systems to exchange data. It is event-driven and allows for the transfer of messages from one application to another. EAI is useful for connecting enterprise applications in real time for business process automation. EAI is also used to capture changes in operational and other systems to feed real-time data to data warehouses. EAI allows organizations to:
- Make a change in one application and reflect it elsewhere
- Ensure that the change is captured and delivered reliably
- Feed data warehouses with real-time data
2. Data Quality
Data quality is an important component of any BI and data warehousing implementation. Without data quality controls to ensure the accuracy and trustworthiness of information, BI deployments will fail to gain end-user confidence. A few vital steps will help deliver trustworthy information.
Data profiling. Understanding your source data by analyzing its characteristics, type, quality, and relationships typically occurs before any ETL or EII development begins. This process provides insight into how data should be transformed to improve data quality. Data profiling can be used to identify problems and anomalies in the source data, such as telephone or Social Security numbers that do not match their expected format or pattern. It can also be used to examine inter-record dependencies, such as sales orders for products that are not in the product master file. Data profiling and data cleansing complement each other. Once data profiling identifies an issue, the data cleansing process can be used to correct the problem.
Data cleansing. Once you’ve profiled your source data, you are ready to cleanse it. The data cleansing process involves identifying, correcting, and consolidating data. For example, with customer data, data cleansing identifies contact names and addresses, then standardizes the data and enhances it to fill in missing fields or incorrect addresses. Matching and merging capabilities provide sophisticated ways to identify members of the same household, combine records by matching different forms of the same name (such as Jon and Jonathan), and match and consolidate records into a single view.
Data validation. This process prevents unwanted data from entering your data warehouse. For example, you may only want sales records in 2005, postal codes that match a specific pattern, or product IDs that are not null. Data quality is often a matter of perspective and requires tight collaboration between IT and business constituents. By defining the business rules that help identify unwanted data, you can ensure a high level of information accuracy.
Data auditing. Another challenge for developers is to audit the integrity of the ETL job against operational rules. By auditing data, you can verify that the expected data is read, processed, and loaded successfully. For example, you can verify that all 100,000 records loaded successfully into the data warehouse if you are extracting tables from flat files. Another useful application is to verify the successful execution of a join. Data auditing can determine if any rows are missing and if any joins have been configured improperly.
3. Semantic Reconciliation
Inconsistent semantics exist in every business domain and its underlying applications. Without a common definition of data across the enterprise, each department will have a conflicting view of the business. This can strangle an organization’s efficiency and agility. To achieve semantic reconciliation, organizations must develop a common definition and manage these semantics through a master reference solution.
Common Definition. The process of establishing these common data definitions is called semantic reconciliation, and it requires executive-level sponsorship. Typically, the role of a data steward is developed to drive inter-departmental standards for describing the business in terms of customers, products, and employees. Also, BI tools offer a semantic layer that further assists in translating the meaning of technical terminology into business.
For example, when determining departmental productivity using “cost per employee” as a metric, do two part-time employees, each working a four-hour day, count as one employee or two? The answer is likely to differ by department and, unless an organization-wide definition is established, departmental comparisons are not meaningful.
Once an organization recognizes differing definitions and standardizes on enterprise definitions, a data warehouse can aid in the implementation. It may be impractical to modify every operational system to reflect the enterprise standard. However, it is possible to transform the data extracted from each operational system to conform to the enterprise’s standard definitions and value lists as the data is loaded into the warehouse. When an analyst uses a data warehouse, he is comparing “apples to apples;” when an analyst directly accesses two or more operational systems for analysis purposes, he is often comparing “apples to oranges.”
Master Data Management. All organizations have data that is used across several departments. Examples of these “reference data” files include customer, product, employee, vendor, and financial data. In many organizations, individual departments maintain their own reference files, and problems frequently arise when different departments use different identifiers or keys for the same customer, making it difficult (or even impossible) to accurately aggregate or combine data across systems.
For example, if a customer’s revenues from both the sales and the service departments can’t be accurately combined, the total value of that customer’s account is understated. While the term “master data management” is receiving tremendous attention, it is essentially an extension of the reference file concept—a concept that was behind the use of centralized Rolodex files even before the common business use of computers. Data integration technology, combined with data-quality software, is the underlying technology for delivering an organization-wide master data management solution.
4. Metadata Management
BI deployments involve numerous tools that all have their own metadata with a large amount of overlap. Databases generate metadata for data dictionaries. ETL tools generate metadata for physical mappings, transformations, and data quality. BI systems generate metadata for the semantic layer, reports, objects, goals, and metrics. Modeling tools generate metadata on logical mappings.
Because of this rampant metadata proliferation, it is difficult for the business to trace the origin of a number from a BI report back to its transactional source. This requirement is particularly important for complying with regulations that require an audit trail for financial reporting. Conversely, seeing impact analysis from a source system all the way to the BI report user provides dramatic visibility for IT to assure data integrity and manage change in a BI environment.
By using a metadata management solution to consolidate and integrate all BI-related metadata into a single location, IT can view, analyze, and explore metadata from all the disparate systems. This enables IT staff to understand the context of information in their BI environment and understand relationships between metadata objects, data structure, end-to-end impact analysis, report-to-source data lineage, and operational statistics. With a metadata management strategy, organizations can deliver trusted data for compliance requirements, internal controls, and improved decision making, as well as rapidly reduce the BI project change-management costs.
Enterprise Information Management
Organizations require timely, consistent access to trustworthy information from within their organizations and beyond.
To reach this goal, organizations must implement an EIM strategy that combines familiar and new methods for addressing data integration, data quality, semantic reconciliation, and metadata management. These components should be available as services that ultimately support the business-facing BI deployment. An EIM strategy that offers integration with the BI platform delivers a deeper level of insight and visibility to the organization.
Managing enterprise information is an ongoing process that requires constant tuning as the business grows and evolves. EIM provides a way for organizations to set up a flexible framework to meet the rapidly changing information needs of the business.
A flexible EIM framework that is integrated with and supports the BI environment provides:
- A trustworthy data foundation for BI
- Agility to access real-time information for operational BI
- A single, consistent view of the enterprise