preview

Optimization Of Existing Data Warehouse Using Hadoop Essay

Decent Essays

Source : (Informatica, January 2014) Data warehouse optimization using Hadoop, page 8
Optimization of existing data warehouse using Hadoop (HDFS) can be implemented by following below seven steps or fundamental processes which are divided into 2 phases.
1.Offload data & ETL processing to Hadoop : - This step will leverage high CPU consuming ETL processes which were earlier executed on data warehouse causing performance degradation and slow reads and in addition will free space from data warehouse by offloading low-value or infrequently used information onto Hadoop.
2.Batch load raw data to Hadoop : -Data from a wide variety of source systems like existing RDBMS systems, emails, web logs, mobile apps, call center data which means raw transactional, structured, un-structured and semi-structured data will be loaded directly on to Hadoop further reducing impact on the warehouse.
3.Replicate changes and schemas for data : -Entire schemas from RDBMs can be replicated to Hadoop further offloading processing from OLTP. Users can further optimize performanceand reduce latency by choosing the option of change data capture to move only newly updatedinformation.Since Hadoop doesn’t impose schema requirements on data, unstructured information previously unusable by the warehouse can be leveraged in Hadoop.
4.Collect & stream real-time machine data : -Data generated by ICICI Bank’s mobile as well as web applications and web site including web log files, click streams can be collected in

Get Access