Database - Data Mining

51. Suppose there are five instances, i1, i2, i3, i4, i5 in a dataset having three features, X,Y and Z as shown in the table below:

InstancesXYZ
i11.62.35.1
 i22.42.54.6
i33.93.63.7
i44.13.72.5
i55.68.31.8
In order to find the dependence between two variables we use the Pearson's Correlation Coefficient. Based on your understanding of Correlation Coefficient, choose the correct option/s:
  1. A strong positive correlation between X and Y
  2. A strong negative correlation between X and Y
  3. A weak positive correlation between X and Z
  4. A weak negative correlation between X and Z

Cancel reply

Your email address will not be published. Required fields are marked *


Cancel reply

Your email address will not be published. Required fields are marked *


52. Suppose, you got a situation where you find that your linear regression model is under fitting the data. In such situation which of the following options would you consider?
a. You will add more features
b. You will start introducing higher degree features
c. You will remove some features

Cancel reply

Your email address will not be published. Required fields are marked *


Cancel reply

Your email address will not be published. Required fields are marked *


Consider the dataset, S given below:

Elevation Road TypeSpeed LimitSpeed
steepUnevenYesSlow
steepSmooth YesSlow
flatUnevenNoFast
steepSmoothNoFast
Elevation, Road Type and Speed Limit are the features and Speed is the target label that we want to predict.

53. Find the entropy of the dataset, S as given above:

Cancel reply

Your email address will not be published. Required fields are marked *


Cancel reply

Your email address will not be published. Required fields are marked *


Consider the dataset, S given below:

Elevation Road TypeSpeed LimitSpeed
steepUnevenYesSlow
steepSmooth YesSlow
flatUnevenNoFast
steepSmoothNoFast
Elevation, Road Type and Speed Limit are the features and Speed is the target label that we want to predict.

54. Find the information Gain if the dataset is split at the feature "Elevation":

Cancel reply

Your email address will not be published. Required fields are marked *


Cancel reply

Your email address will not be published. Required fields are marked *


Consider the dataset, S given below:

Elevation Road TypeSpeed LimitSpeed
steepUnevenYesSlow
steepSmooth YesSlow
flatUnevenNoFast
steepSmoothNoFast
Elevation, Road Type and Speed Limit are the features and Speed is the target label that we want to predict.

55. Find the feature on which the parent node must be chosen to split the dataset, S based on information gain:

Cancel reply

Your email address will not be published. Required fields are marked *


Cancel reply

Your email address will not be published. Required fields are marked *