preview

Data Cleaning Case Study

Decent Essays

In business, data warehouse plays an important role that combine business activities and it consider the basement that support in taking the decision. Any kind of error in data can cause drawbacks and difficulties for business and that leads to getting negative results. Errors usually have reason stands behind, some errors occur during data collecting from different sources while others occur during transferring. So, one of the big challenges that face data warehouse is to ensure that data quality remains high. The process which use to introduce or process data with high quality called data cleaning. Data cleaning consider new in research area, and it highly coast specially for massive data, modern computers allowing us to perform data …show more content…

For example, Wang have modern tool it can support data integrity analysis within the frame work TDQM. A large variety of tools is available on the market to support data transformation and data cleaning tasks, for data warehousing Some tools concentrate on a specific domain, such as cleaning name and address data, or a specific cleaning phase, such as data analysis or duplicate elimination. Due to their restricted domain, specialized tools typically perform very well but must be complemented by other tools to address the broad spectrum of transformation and cleaning problems. Other tools like ETL tools, provide a comprehensive transformation and workflow capabilities to cover large part of the data transformation and cleaning process. A general problem of ETL tools is their limited interoperability due to proprietary application programming interfaces (API) and proprietary metadata formats making it difficult to combine the functionality of several tools. Tools for data analysis and data reengineering which process instance data to identify data errors and inconsistencies, and to derive corresponding cleaning transformations. Data cleaning approaches: Data cleaning usually have several stages Data analysis: in this phase, the type of error determined and the data inspected manually or data samples should use to gain metadata that relate to data properties and find data quality problems Definition of transformation work flow and mapping roles: based on the

Get Access