Data Mining



Short Question Answers

Q What is Data mining ?
A Data mining is  knowledge discovery in databases. It is  extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases.


Q What is difference between OLAP and data mining ?
A  OLAP - (On-line Analytical Processing )provides you with a very good view of what is happening, but can not predict what will happen in the future or why it is happening where as data mining is group of techniques that find relationships that have not previously been discovered.


Q What are the types of tasks that are carried out during data mining ?
A  Data mining involves 2 types of tasks
Prediction Tasks- 
 Use some variables to predict unknown or future values of other variables

Description Tasks- Find human-interpretable patterns that describe the data.


Q What are  some of the tasks of data mining?
A Following activities are carried out during data mining
  • Classification [Predictive]
  • Clustering [Descriptive]
  • Association Rule Discovery [Descriptive]
  • Sequential Pattern Discovery [Descriptive]
  • Regression [Predictive]
  • Deviation Detection [Predictive]

Q  What do you mean by preprocessing of data in data mining ? 
A Before data is mined it has to be preprocessed. It consists of following three stages   

  • Data cleaning - Real world data is dirty so need to be cleaned

  • Data reduction- Remove data not useful for mining

  • Data transformation - Syntactic transformation

Q What is Data cleaning ?
Causes of Dirty Data

  • Missing values
  • Noisy data (Human/Machine Errors)
  • Inconsistent data
Data cleaning tasks
  • Handling missing values
  • Identify outliers and smooth out noisy data
  • Correct inconsistent data

Q Explain Data reduction ? A It consists of following three tasks -
  • Dimensionality reduction -  Attribute subset selection
  • Numerosity reduction - Tuple subset selection
  • Discretization - Reduce the cardinality of active domain
Q What is Data Transformation ? A It consist of following tasks
  1. Generalization -  concept hierarchy climbing
  2. Attribute/feature construction - New attributes are constructed and added to the tuple
  3. Normalization - scaled to fall within a small, specified range