View : 823 Download: 0

A Taxonomy of Dirty Data

Title
A Taxonomy of Dirty Data
Authors
Kim W.Choi B.-J.Hong E.-K.Kim S.-K.Lee D.
Ewha Authors
최병주
SCOPUS Author ID
최병주scopus
Issue Date
2003
Journal Title
Data Mining and Knowledge Discovery
ISSN
1384-5810JCR Link
Citation
Data Mining and Knowledge Discovery vol. 7, no. 1, pp. 81 - 99
Indexed
SCI; SCIE; SCOPUS WOS scopus
Document Type
Article
Abstract
Today large corporations are constructing enterprise data warehouses from disparate data sources in order to run enterprise-wide data analysis applications, including decision support systems, multidimensional online analytical applications, data mining, and customer relationship management systems. A major problem that is only beginning to be recognized is that the data in data sources are often "dirty". Broadly, dirty data include missing data, wrong data, and non-standard representations of the same data. The results of analyzing a database/data warehouse of dirty data can be damaging and at best be unreliable. In this paper, a comprehensive classification of dirty data is developed for use as a framework for understanding how dirty data arise, manifest themselves, and may be cleansed to ensure proper construction of data warehouses and accurate data analysis. The impact of dirty data on data mining is also explored.
DOI
10.1023/A:1021564703268
Appears in Collections:
인공지능대학 > 컴퓨터공학과 > Journal papers
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

BROWSE