preview

Database Analysis : The Data Warehouse

Better Essays

As your business evolves, the data warehouse may not meet the requirements of your organization. Organizations have information needs that are not completely served by a data warehouse. The needs are driven as much by the maturity of the data use in business as they are by new technology.
For example, the relational database at the center of the data warehouse limits is ideal for data processing to what can be done via SQL. Thus, if the data cannot be processed via SQL, then it limits the analysis of the new data source that is not in row or column format. Other data sources that do not fit nicely in the data warehouse include text, images, audio and video, all of which are considered as semi-structured data. Thus, this is where Hadoop enters the architecture.
Hadoop is a family of products (Hadoop Distributed File System (HDFS), MapReduce, Pig, Hive, HBase, Mahout, Cassandra, YARN, Ambari, Avro, Chukwa, and Zookeeper), each with different and multiple capabilities. Please visit www.apache.org for details on these products. These products are available as native open source from Apache Software Foundation (ASF) and the software vendors.
Once the data isdata are stored in Hadoop, the big data applications can be used to analyze the data. Figure 4.3 shows a simple standalone Hadoop architecture.
 Semi-structured data sources: – the The semi-structured data cannot be stored in a relational database (in column/row format). These data sources include email, social data, XML

Get Access