Data Warehouse Components And Architecture

DATA WAREHOUSE COMPONENTS & ARCHITECTURE Lecture Note # 02 The data in a data warehouse comes from operational systems of the organization as well as from other external sources. These are collectively referred to as source systems. The data extracted from source systems is stored in a area called data staging area, where the data is cleaned, transformed, combined, deduplicated to prepare the data for us in the data warehouse. The data staging area is generally a collection of machines where simple activities like sorting and sequential processing takes place. The data staging area does not provide any query or presentation services. As soon as a system provides query or presentation services, it is categorized as a presentation server. A presentation server is the target machine on which the data is loaded from the data staging area organized and stored for direct querying by end users, report writers and other applications. The three different kinds of systems that are required for a data warehouse are: 1. Source Systems 2. Data Staging Area 3. Presentation servers The data travels from source systems to presentation servers via the data staging area. The entire process is popularly known as ETL (extract, transform, and load) or ETT (extract, transform, and transfer). Oracle’s ETL tool is called Oracle Warehouse Builder (OWB) and MS SQL Server’s ETL tool is called Data Transformation Services (DTS). A typical architecture of a data warehouse is shown below:
