MakeMyTrip Interview for Data Engineer

Q.1. What is Regression ? What is Classification ?
Ans. Regression : Target variable continous.
Classification : Target variable is discrete

Q.2. What are the error metrics used for each of them ?
Ans. Regression : SSE
Classification : Confusion metrics

Q.three. Why Accuracy doesn’t assist when there’s a class imbalance ?
Ans. Because accuracy will get drawn by the bulk class.

Q.four. How to deal with class imbalance ?
Ans. 1. Use efficiency metrics as Area Under ROC Curve
2. Penalize Algorithms
three. Use Tree-Based Algorithms like RF, Gradient Boosted Trees

Q.5. What is knn ?
Q.6. What is k-means ?
Explanation right here :

Q.7. y=ax+b is a linear mannequin. Can you inform me if y=ax^2 + bx + c can also be a linear mannequin ?
Ans. y=ax^2 + bx + c can also be linear as x^2 can be represented as X.
So, the precise relationship won’t be linear however the mannequin fitted is linear

Q.eight. What is SSE and RMSE ? Why to make use of RMSE and never SSE ?
Ans. RMSE has imply worth however SSE is whole worth.

Q.9. Does a low RMSE denote overfitting ?

Q.10. How to resolve overfitting ?
Ans. 1. Cross-validation
2. Regularization
three. Ensembling

Q.11. Why knn shouldn’t be a mannequin ?
Ans.11. It is a lazy mannequin.

Q.12. Write code in Python for the next issues :
bookings desk :
id, date, platform
1, 12/three, android
2, 12/three, ios
three, 13/three, android
four, 13/three, ios
5, 13/three, android
6, 14/three, ios
7, 14/three, android
For every date, what number of bookings are from android and what number of from ios ?
Answer :
df1 = pd.read_csv(“MMT.csv”)
df1.groupby([‘date’, ‘platform’]).rely()

knowledge = [‘cat’, ‘bat’, ‘rat’, ‘cat’, ‘rat’]
Give the rely of every distinctive aspect of the listing
Answer :
import pandas as pd
knowledge = [‘cat’, ‘bat’, ‘rat’, ‘cat’, ‘rat’]
df = pd.DataBody(knowledge, columns=[‘Category’])
vc = df[‘Category’].value_counts()

