INTERVIEW QUESTIONS Archives - SecretDataScientist.com

What is Out-Of-Sample Evaluation?

Out-Of-Sample Evaluation means to withhold some of the sample data from the model identification and estimation process, then use the model to make predictions for the hold-out data in order to see how accurate they are and to determine whether the statistics of their errors are similar to those that the model made within the sample of data that was … Read more

What is Nearest Neighbor Algorithm?

Nearest Neighbor Algorithm was one of the first algorithms used to determine a solution to the traveling salesman problem. In it, the salesman starts in a random city and repeatedly visits the nearest city until all have been visited. It quickly yields a short tour, but usually not the optimal one. The nearest neighbor algorithm is easy to implement and … Read more

What is Multinomial Logistic Regression?

Multinomial Logistic Regression is the linear regression analysis to conduct when the dependent variable is nominal with more than two levels. Thus it is an extension of logistic regression, which analyzes dichotomous (binary) dependents. Since the output of the analysis is somewhat different to the logistic regression’s output, multinomial regression is sometimes used instead. Like all linear regressions, the multinomial … Read more

What is Markov Model?

Markov Model in probability theory is a stochastic model used to model randomly changing systems where it is assumed that future states depend only on the current state not on the events that occurred before it (defined as the Markov property). Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. For this reason, in … Read more

What is Manhattan Distance?

Manhattan Distance is the distance between two points measured along axes at right angles. The name hints to the grid layout of the streets of Manhattan, which causes the shortest path a car could take between two points in the city. The limitation of the Manhattan Distance heuristic is that it considers each tile independently, while in fact, tiles interfere … Read more

What is LOOCV or Leave-One-Out Cross Validation?

LOOCV or Leave-One-Out Cross Validation. LOOCV uses one observation from the original sample as the validation data, and the remaining observations as the training data. This is repeated such that each observation in the sample is used once as the validation data. This is the same as a K-fold cross-validation with K being equal to the number of observations in … Read more

What is Long-Tailed Distribution?

Long-Tailed Distribution in statistics and business is the portion of the distribution having a large number of occurrences far from the “head” or central part of the distribution. The term is often used loosely, with no definition or arbitrary definition, but precise definitions are possible. Broadly speaking, for such population distributions, the majority of occurrences (more than half, and where … Read more

What is Long Short-Term Memory(LSTM) in machine learning?

Long Short-Term Memory usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is their default behavior. All recurrent neural networks have the form of a chain of repeating modules of a neural network. In standard RNNs, … Read more

What is Log-Normal Distribution?

Log-Normal Distribution in probability theory is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable is log-normally distributed, then has a normal distribution. Likewise, if Y has a normal distribution, then X=exp(y) has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. The distribution is … Read more

What is Logistic Regression?

Logistic Regression in statistics is a regression model where the dependent variable is categorical. For example the case of a binary dependent variable—that is, where it can take only two values, “0” and “1”, which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick. Cases where the dependent variable has more than two outcome categories may be analysed in multinomial … Read more