Table of Contents [show]
Machine learning is
one of the most widespread technologies today. This comprehensive blog covers
some of the most frequently asked machine learning interview questions to help
you review all the necessary concepts and skills to land your dream job.
This blog is
specially designed for you to prepare thoroughly for the Machine Learning
interview before the interview.
Here is a list of the
top 20 Machine learning interview
questions.
Machine Learning Interview Questions for Freshers
1. Why was machine learning introduced?
The most genuine
answer is to make your life easier. In the earlier days of
"intelligent" applications, many systems used hard-coded
"if" and "else" decision rules to process data or modify
user input. Imagine a spam filter whose task is to move relevant incoming email
messages to the spam folder.
But with machine
learning algorithms, we get enough information to learn and identify patterns
from the data.
Unlike everyday
problems, we don't need to write new rules for every situation in machine
learning. We need to use the same workflow but a different data set.
2. What is PCA? When do you utilize it?
Principal component
analysis (PCA) is most generally utilized for dimensionality reduction.
In this case, PCA
calculates the variation in each variable (or column in the table). If there is
a slight deviation, it throws the variable out.
Principal Component
Analysis (PCA)
This makes it easier
to visualize the data set. PCA is utilized in finance, neuroscience, and
pharmacology.
This is advantageous
as a preprocessing step, especially when there are linear correlations between
features.
3. What are support vectors in SVM?
A Support Vector
Machine (SVM) is an algorithm that endeavors to place a line (or plane or
hyperplane) between different classes to maximize the distance from the line to
the class points.
This way, he tries to
find a robust separation between the classes. Support vectors are points on the
edge of the dividing hyperplane.
4. What are the different kernels in SVM?
There are six kinds
of kernels in SVM:
Linear kernel: utilized when the data is linearly separable.
Polynomial Kernel: When you have discrete data that does not have a natural
notion of smoothness.
Radial Kernel: Create a decision boundary that separates two classes
much better than a linear kernel.
Sigmoid kernel: utilized as an activation function for neural networks.
5. What is cross-validation?
Cross-validation
divides all your data into three parts: training, testing, and validation data.
The data is separated into k subsets, and the model is trained on k-1 of these
datasets.
The final subset is
kept for testing. This is accomplished for each of the subsets. This is k-fold
cross-validation. Finally, the scores from all k-folds are averaged to produce
a final score.
6. What is bias in machine learning?
Data skew tells us
that there is an inconsistency in the data. Inconsistency can occur for several
reasons, which are not mutually exclusive.
For example, to speed
up the hiring process, a tech giant like Amazon built one engine where it will
put 100 resumes, spit out the best five and hire them.
When the company
realized that the software was not producing gender-neutral results, it was
modified to remove this bias.
7. Explain the distinction between classification and regression?
Classification is
used to obtain discrete results; classification is used to classify data into
some specific categories.
For example, sorting
emails into spam and non-spam types.
Whereas regression
handles continuous data.
For example,
predicting goods prices at a certain point in time.
Classification is
used to predict the output of a cluster of classes.
Such as, Is it hot or
cold tomorrow?
On the other hand,
regression is used to predict the connection that the data represents.
For example: What is
the temperature tomorrow?
8. What is clustering?
Clustering is the technique
of clustering a set of objects into various groups. Things in the same cluster
should be similar and different from things in other clusters.
There are several
types of clustering:
" Hierarchical clustering
" K stands for clustering
" Density-based clustering
" Fuzzy clustering, etc.
9. How can you choose K for K-means Clustering?
There are two kinds
of methods which include direct methods and statistical testing methods:
Direct methods:
Includes elbow and silhouette
Statistical Test
Methods: Has gap statistics.
When determining the
optimal value of k, the silhouette is most often used.
10. How do you make sure which machine learning algorithm to use?
It totally depends on
the dataset one has. If the data is not continuous, we use SVM. If the data set
is continuous, we use linear regression.
So there is no
specific way to let us know which ML algorithm to use. It all comes down to
exploratory data analysis (EDA).
EDA is like a
"conversation" with a dataset; we do these things in EDA:
" Classify our variables as persistent,
categorical, and so on.
" Summarize our variables using descriptive
statistics.
" Visualize our variables with graphs.
" Based on the above observations, select the
single most appropriate algorithm for a particular data set.
Advanced Machine Learning Interview Questions
11. How to deal with excessive and insufficient equipment?
Overfitting refers
that the model fitting the training data too well. In this circumstance, we
need to resample the data and evaluate the model's accuracy using techniques
such as k-fold cross-validation.
While in the case of
Underfitting, we cannot understand or capture patterns from the data, in this
circumstance, we need to alter the algorithms or add more data points to the model.
12. What are referral systems?
A recommender is a
system used to predict users' interests and recommend products that are likely
to interest them.
The data required for
recommendation systems comes from explicit user ratings after watching a movie
or listening to a song, implicit search engine queries, purchase history, or
another user/item knowledge.
13. How do you check the normality of a data set?
Visually, we can use
graphs. Some of the normality checks are as follows:
" Shapiro-Wilk test
" Anderson-Darling test
" Martinez-Iglewicz test
" Kolmogorov-Smirnov test
" D'Agostino skewness test
14. Can logistic regression be used for more than 2 classes?
No, logistic
regression is a binary classifier by default, so it cannot be applied to more
than 2 classes. However, it can be extended to solve multi-class classification
problems (multinomial logistic regression)
15. Explain correlation and covariance?
Correlation is used
to measure and estimate the quantitative connection between two variables.
Correlation estimates how strongly two variables are associated. Examples like
income and expense, demand and store, etc.
Covariance is a
straightforward way to calculate the correlation between two variables. The
problem with covariance is that it is hard to compare them without
normalization.
16. What is P-value?
P-values ??are used
to make hypothesis test decisions. The p-value is the minimum influential level
at which you can refuse the null hypothesis. The minimum the p-value, the more
likely you will leave the null hypothesis.
17. What are parametric and non-parametric models?
Parametric models
will have limited parameters; you only need to know the model parameter to
predict new data.
Non-parametric models
have no restrictions on accepting multiple parameters, allowing for greater
flexibility and prediction of new data. You need to know the condition of the
data and model parameters.
18. How to handle outliers?
An outlier is an
observation in a data set far from the other observations in the data set. The
tools used to detect outliers are
Box plot, Z-score,
Scatter plot, etc.
We usually need to
follow three simple strategies to deal with outliers:
We can drop them off.
We can keep them as
outliers and retain them as a feature.
Similarly, we can
transform quality to reduce the effect of outliers.
19. What is reinforcement learning?
Reinforcement
learning differs from other types of learning, such as supervised and
unsupervised learning. In reinforcement learning, we are not given data or
labels. Our learning is based on the rewards provided to the agent by the
environment.
20. Difference between Sigmoid and Softmax functions?
A sigmoid function is
operated for binary classification. The Sum of probabilities must be 1. While
the Softmax function is used for multiple classifications. The Sum of the
possibilities will be 1.
Conclusion
Machine learning is
progressing so fast; therefore, new concepts emerge.
In this blog, we have
seen 20 most frequently asked questions
about machine learning and their relevant answers for interviewing freshers.
We wish this blog has helped you on your journey to becoming a machine learning
engineer and related work.