Database Analysis : The Data Warehouse

Better Essays

As your business evolves, the data warehouse may not meet the requirements of your organization. Organizations have information needs that are not completely served by a data warehouse. The needs are driven as much by the maturity of the data use in business as they are by new technology.
For example, the relational database at the center of the data warehouse limits is ideal for data processing to what can be done via SQL. Thus, if the data cannot be processed via SQL, then it limits the analysis of the new data source that is not in row or column format. Other data sources that do not fit nicely in the data warehouse include text, images, audio and video, all of which are considered as semi-structured data. Thus, this is where Hadoop enters the architecture.
Hadoop is a family of products (Hadoop Distributed File System (HDFS), MapReduce, Pig, Hive, HBase, Mahout, Cassandra, YARN, Ambari, Avro, Chukwa, and Zookeeper), each with different and multiple capabilities. Please visit www.apache.org for details on these products. These products are available as native open source from Apache Software Foundation (ASF) and the software vendors.
Once the data isdata are stored in Hadoop, the big data applications can be used to analyze the data. Figure 4.3 shows a simple standalone Hadoop architecture.
 Semi-structured data sources: – the The semi-structured data cannot be stored in a relational database (in column/row format). These data sources include email, social data, XML

Get Access

Decent Essays
Comcast: Supply Chain Management System
- 1485 Words
- 6 Pages
Comcast: Supply Chain Management System
This data is collected and organized in order to process orders and maintain good customer service. The logical view of data would allow a knowledge worker to arrange and access information based on the needs of the business separating it from the physical view of how information is arranged and stored. The ability to do this allows for an employee to create detailed reports in order to determine information such as customer information and their order numbers and dates. This is imperative for a company like Comcast who has over 27 million customers in order to have a system to keep important data to analyze. Using a data warehouse allows them to gather from several databases and then the company can use the information to determine for example how many units of voice products are sold to create the necessary business intelligence to make future decisions and remain
- 1485 Words
- 6 Pages
Decent Essays
Read More
Decent Essays
Nt1330 Unit 1 Problem Analysis Paper
- 428 Words
- 2 Pages
Nt1330 Unit 1 Problem Analysis Paper
Hadoop \cite{white2012hadoop} is an open-source framework for distributed storage and data-intensive processing, first developed by Yahoo!. It has two core projects: Hadoop Distributed File System (HDFS) and MapReduce programming model \cite{dean2008mapreduce}. HDFS is a distributed file system that splits and stores data on nodes throughout a cluster, with a number of replicas. It provides an extremely reliable, fault-tolerant, consistent, efficient and cost-effective way to store a large amount of data. The MapReduce model consists of two key functions: Mapper and Reducer. The Mapper processes input data splits in parallel through different map tasks and sends sorted, shuffled outputs to the Reducers that in turn groups and processes them using a reduce task for each group.
- 428 Words
- 2 Pages
Decent Essays
Read More
Decent Essays
Nt1330 Unit 3 Problem Analysis Paper
- 1103 Words
- 5 Pages
Nt1330 Unit 3 Problem Analysis Paper
HDFS is Hadoop’s distributed file system that provides high throughput access to data, high-availability and fault tolerance. Data are saved as large blocks making it suitable for applications
- 1103 Words
- 5 Pages
Decent Essays
Read More
Satisfactory Essays
Nt1330 Unit 5 Algorithm
- 239 Words
- 1 Pages
Nt1330 Unit 5 Algorithm
An important characteristic of Hadoop is the partitioning of data and computation across many (thousands) of hosts, and the execution of application computations in parallel close to their data. A Hadoop cluster scales computation capacity, storage capacity and I/O bandwidth by simply adding commodity servers. Hadoop clusters at Yahoo! span 40,000 servers, and store 40 petabytes of application data, with the largest cluster
- 239 Words
- 1 Pages
Satisfactory Essays
Read More
Decent Essays
Nt1330 Unit 3 Problem Analysis Paper
- 373 Words
- 2 Pages
Nt1330 Unit 3 Problem Analysis Paper
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a Parallel and distributed computing environment. It makes Use of the commodity hardware Hadoop is Highly Scalable and Fault Tolerant. Hadoop runs in cluster and eliminates the use of a Super computer. Hadoop is the widely used big data processing engine with a simple master slave setup. Big Data in most companies are processed by Hadoop by submitting the jobs to Master. The Master distributes the job to its cluster and process map and reduce tasks sequencially.But nowdays the growing data need and the and competition between Service Providers leads to the increased submission of jobs to the Master. This Concurrent job submission on Hadoop forces us to do Scheduling on Hadoop Cluster so that the response time will be acceptable for each job.
- 373 Words
- 2 Pages
Decent Essays
Read More
Good Essays
Swot Analysis : Coles Supermarkets
- 1097 Words
- 5 Pages
Swot Analysis : Coles Supermarkets
One crucial thing that organizations need to consider in today’s unstructured data world is to successfully integrate data warehouses. For this, the companies need to re-consider their enterprise data architecture and classify the governance strategy that can be talented through such efforts. There lies a need for data managers
- 1097 Words
- 5 Pages
Good Essays
Read More
Better Essays
Qualitative Research Philosophy
- 1480 Words
- 6 Pages
Qualitative Research Philosophy
Research topic was derived from the understanding of query processing in MySQL and Hadoop, the database performance issues, performance tuning and the importance of database performance. Thus, it was decided to develop a comparative analysis to observe the effectiveness of the performance of MySQL (non cluster) and Hadoop in structured and unstructured dataset (Rosalia, 2015). Furthermore, the analysis included a comparison between those two platforms in two variance of data size.
- 1480 Words
- 6 Pages
Better Essays
Read More
Better Essays
Business Analysis : Large Amounts Of Data Essay
- 2189 Words
- 9 Pages
Business Analysis : Large Amounts Of Data Essay
Cost reduction: Big data technologies such as Hadoop and cloud based analytics bring significant cost advantages when it comes to storing large amounts of data – plus they can identify and implement more efficient ways of doing business.
- 2189 Words
- 9 Pages
Better Essays
Read More
Decent Essays
What Is The Data Used In A Data Warehouse?
- 1046 Words
- 5 Pages
What Is The Data Used In A Data Warehouse?
Another issue that may come about that may enlighten officials when it is time to develop a data warehouse is when there may be discrepancies in the reports among the different departments. Wal-Mart, Toyota, and other large corporations use data warehouses as they have a mass amount of information related to their businesses that includes customers, sales, financing and marketing just to name a few. If these businesses did not use data warehouses,
- 1046 Words
- 5 Pages
Decent Essays
Read More
Better Essays
Advantages And Disadvantages Of Hadoop Distributed File System
- 2321 Words
- 10 Pages
Advantages And Disadvantages Of Hadoop Distributed File System
HDFS Design: The HDFS file system is designed for storing files which are very large means files that are hundreds of megabytes, gigabytes and terabytes in size, with streaming data access patterns, running on commodity hardware clusters. HDFS has a idea of write once read many times pattern. A dataset is typically generated or copied from the source and various analyses are performed on that dataset. And hadoop does not need expensive hardware. It is designed to run on commodity hardware.
- 2321 Words
- 10 Pages
Better Essays
Read More
Better Essays
The Return on Investment of Data Warehousing Essay
- 1909 Words
- 8 Pages
The Return on Investment of Data Warehousing Essay
The data warehouse comes ready for use, but an organization has to get prepared to use it. The main factor is data warehouse usage. A data warehouse can be used for decision making for management staff.
- 1909 Words
- 8 Pages
Better Essays
Read More
Better Essays
Hadoop Distributed File System Analysis
- 2019 Words
- 9 Pages
Hadoop Distributed File System Analysis
Abstract - Hadoop Distributed File System, a Java based file system provides reliable and scalable storage for data. It is the key component to understand how a Hadoop cluster can be scaled over hundreds or thousands of nodes. The large amounts of data in Hadoop cluster is broken down to smaller blocks and distributed across small inexpensive servers using HDFS. Now, MapReduce functions are executed on these smaller blocks of data thus providing the scalability needed for big data processing. In this paper I will discuss in detail on Hadoop, the architecture of HDFS, how it functions and the advantages.
- 2019 Words
- 9 Pages
Better Essays
Read More
Better Essays
The Evolution Of The Data Stored Essay
- 1556 Words
- 7 Pages
The Evolution Of The Data Stored Essay
Data has always been analyzed within companies and used to help benefit the future of businesses. However, the evolution of how the data stored, combined, analyzed and used to predict the pattern and tendencies of consumers has evolved as technology has seen numerous advancements throughout the past century. In the 1900s databases began as “computer hard disks” and in 1965, after many other discoveries including voice recognition, “the US Government plans the world’s first data center to store 742 million tax returns and 175 million sets of fingerprints on magnetic tape.” The evolution of data and how it evolved into forming large databases continues in 1991 when the internet began to pop up and “digital storage became more cost effective than paper. And with the constant increase of the data supplied digitally, Hadoop was created in 2005 and from that point forward there was “14.7 Exabytes of new information are produced this year" and this number is rapidly increasing with a lot of mobile devices the people in our society have today (Marr). The evolution of the internet and then the expansion of the number of mobile devices society has access to today led data to evolve and companies now need large central Database management systems in order to run an efficient and a successful business.
- 1556 Words
- 7 Pages
Better Essays
Read More
Better Essays
Data Warehouse : Database Analysis And Query
- 1851 Words
- 8 Pages
Data Warehouse : Database Analysis And Query
Data warehouse is a central repository integrating data from various operating systems for validation of data, prediction etc .Data Warehouse is a relational database used for analysis and query rather than transactional database. It is used to collect historical data from various sources, integrate, analyze a particular subject, report. Data warehouse is time variant i.e one can retrieve any older data and once data enters data warehouse it cannot change [1]. According to Ralph Kimball Data warehouse is “copy of transaction data constructed for analysis and query”[5]. Data is taken from various sources like marketing, sales, ERP etc.
- 1851 Words
- 8 Pages
Better Essays
Read More
Decent Essays
Data Ware House Technology Essays
- 814 Words
- 4 Pages
Data Ware House Technology Essays
Data is the raw materials of any information system. With the revolution of Information Technology we are improving our decision making process more quick and smart. Data warehouse technology is the process of collection, sorting, structural formation, analysis, storing and presentation of data. So we say that data warehouse is the technology is overall data management system in the organization.
- 814 Words
- 4 Pages
Decent Essays
Read More

Get Access

Database Analysis : The Data Warehouse

Comcast: Supply Chain Management System

Comcast: Supply Chain Management System

Nt1330 Unit 1 Problem Analysis Paper

Nt1330 Unit 1 Problem Analysis Paper

Nt1330 Unit 3 Problem Analysis Paper

Nt1330 Unit 3 Problem Analysis Paper

Nt1330 Unit 5 Algorithm

Nt1330 Unit 5 Algorithm

Nt1330 Unit 3 Problem Analysis Paper

Nt1330 Unit 3 Problem Analysis Paper

Swot Analysis : Coles Supermarkets

Swot Analysis : Coles Supermarkets

Qualitative Research Philosophy

Qualitative Research Philosophy

Business Analysis : Large Amounts Of Data Essay

Business Analysis : Large Amounts Of Data Essay

What Is The Data Used In A Data Warehouse?

What Is The Data Used In A Data Warehouse?

Advantages And Disadvantages Of Hadoop Distributed File System

Advantages And Disadvantages Of Hadoop Distributed File System

The Return on Investment of Data Warehousing Essay

The Return on Investment of Data Warehousing Essay

Hadoop Distributed File System Analysis

Hadoop Distributed File System Analysis

The Evolution Of The Data Stored Essay

The Evolution Of The Data Stored Essay

Data Warehouse : Database Analysis And Query

Data Warehouse : Database Analysis And Query

Data Ware House Technology Essays

Data Ware House Technology Essays

Related Topics