DATA WAREHOUSE COMPONENTS & ARCHITECTURE
Lecture Note # 02
The data in a data warehouse comes from operational systems of the organization as well as from other external sources. These are collectively referred to as source systems. The data extracted from source systems is stored in a area called data staging area, where the data is cleaned, transformed, combined, deduplicated to prepare the data for us in the data warehouse. The data staging area is generally a collection of machines where simple activities like sorting and sequential processing takes place. The data staging area does not provide any query or presentation services. As soon as a system provides query or presentation services, it is categorized as a presentation server. A presentation server is the target machine on which the data is loaded from the data staging area organized and stored for direct querying by end users, report writers and other applications. The three different kinds of systems that are required for a data warehouse are:
1. Source Systems
2. Data Staging Area
3. Presentation servers
The data travels from source systems to presentation servers via the data staging area. The entire process is popularly known as ETL (extract, transform, and load) or ETT (extract, transform, and transfer). Oracle’s ETL tool is called Oracle Warehouse Builder (OWB) and MS SQL Server’s ETL tool is called Data Transformation Services (DTS).
A typical architecture of a data warehouse is shown below:
Real-time data warehousing creates some special issues that need to be solved by data warehouse management. These can create issues because of the extensive technicality that is involved for not only planning the system, but also managing problems as they arise. Two aspects of the BI system that need to be organized in order to elude any technical problems are: the architecture design and query workload balancing.
Up until this point, Third Star Financial Services has operated via a succession of mergers and acquisitions where systems were inherited but never integrated into the network. Its data management has been virtually non-existent and entirely ineffective. Evidence of this can be found in the absence of an enterprise-wide data management solution and the presence of several disparate systems operating independently with no measurable benefit to the company. Due to a lack of actionable data, management makes decisions based on instinct rather than through analysis. A direct consequence of this is a steadily declining market share and loss of high-level employees to competing companies. Fortunately, this discrepancy has been identified and Third Star executives have established the new goal of modernizing and streamlining operations. Using concepts outlined by the Data Management Association (DAMA), this proposed enterprise architecture will allow Third Star to transform their data from a liability to an asset.
24) Before it can be loaded into the data warehouse, operational data must be extracted and
What information is accessible? The data warehouse offers possibilities to define what’s offered through metadata, published information, and parameterized analytic applications. Is the data of high value? Data warehouse patrons assume reliability and value. The presentation area’s data must be correctly organized and harmless to consume. In terms of design, the presentation area would be planned for the luxury of its consumers. It must be planned based on the preferences articulated by the data warehouse diners, not the staging supervisors. Service is also serious in the data warehouse. Data must be transported, as ordered, promptly in a technique that is pleasing to the business handler or reporting/delivery application designer. Lastly, cost is a feature for the data
Data warehouse has different concepts of data. Each concept is divided into a specific data mart. Data mart deals with specific concept of data, data mart is considered as a subset of data warehouse. In Indiana University traditional data warehouse is unable to create large data storage. Further it shows any errors and imposed rules on data. The early binding method is disadvantage. It process longer time to get enterprise data warehouse (EDW) to initiate and running. We need to design our total EDW, from every business rule through outset. The late binding architecture is most flexible to bind data to business rules in data modeling through processing. Health catalyst late binding is flexible and raw data is available in data warehouse. It process result by 90 days and stores IU data without any errors.
- this is to support their information based system while having shared communication between different branches
Extraction, Transformation, and Loading processes are responsible for the operations taking place in the back stage of a data warehouse architecture. In a broader aspect, initially the data is extracted from the source data stores which could be On-Line Transaction Processing or Legacy system, files of any formats, web pages or any other documents like spreadsheets or text documents. In this step, only the data which is different from the previous execution of ETL process (newly inserted, updated) gets extracted from the sources. Next, the extracted data is sent to Data Staging Area where the data is transformed and cleaned. Finally, the data is loaded to the central data warehouse and all its counterparts e.g., data marts and views. (Kabiri & Chiadmi 2013, p.1)
A data warehousing is defined as a collection of data designed to support management decision making. Data warehouses contains a wide variety of data that present a coherent picture of the business conditions at a single point in time. Development of a data warehouse includes development of the systems that extract data from operating systems plus the installation of the warehouse database system that provides managers flexible access to the data. The term data warehousing generally refer to the combination of many different databases across an entire enterprise. (webopidia)
A data warehouse is a large databased organized for reporting. It preserves history, integrates data from multiple sources, and is typically not updated in real time. The key components of data warehousing is the ability to access data of the operational systems, data staging area, data presentation area, and data access tools (HIMSS, 2009). The goal of the data warehouse platform is to improve the decision-making for clinical, financial, and operational purposes.
Chapter 11 Enterprise Resource Planning Systems 1. Closed database architecture is a. a control technique intended to prevent unauthorized access from trading partners. b. a limitation inherent in traditional information systems that prevents data sharing. c. a data warehouse control that prevents unclean data from entering the warehouse. d. a technique used to restrict access to data marts. e. a database structure that many of the leading ERPs use to support OLTP applications. 2. Each of the following is a necessary element for the successful warehousing of data EXCEPT a. cleansing extracted data. b. transforming data. c. modeling data. d. loading data. e. all of the above are necessary. 3. Which of the following is typically NOT part of
A data warehouse is unique kind of a database where current and historical data about a certain group of people such as customers, is stored. Information from operational systems, such as transaction processing systems, is extracted and summarised then stored in in a data warehouse. This type of information includes records about customer interaction patens, customer purchasing history or trends and current customer records. The information in a data warehouse is used for management analysis and decision making.
· Extracting data from source systems, transforming it, and then loading it into a data warehouse
The data warehouse comes ready for use, but an organization has to get prepared to use it. The main factor is data warehouse usage. A data warehouse can be used for decision making for management staff.
Data warehouse are multiple databases that work together. In other words, data warehouse integrates data from other databases. This will provide a better understanding to the data. Its primary goal is not to just store data, but to enhance the business, in this case, higher education institute, a means to make decisions that can influence their success. This is accomplished, by the data warehouse providing architecture and tools which organizes and understands the
In the present juncture, data is the most significant entity for almost every type of organization. Be it software, hardware, healthcare, banking, government, scientific, etc., one of the most crucial part for the success of such organizations relies on how they manage their data and with the passage of time, amount of data is being increased radically which is making it quite difficult for the organizations to manage their data in an efficient manner.