SecretDataScientist.com

What is Nearest Neighbor Algorithm?

Nearest Neighbor Algorithm was one of the first algorithms used to determine a solution to the traveling salesman problem. In it, the salesman starts in a random city and repeatedly visits the nearest city until all have been visited. It quickly yields a short tour, but usually not the optimal one. The nearest neighbor algorithm is easy to implement and … Read more

What is Multiple Regression?

Multiple Regression is an extension of simple linear regression. It is used when we want to predict the value of a variable based on the value of two or more other variables. The variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable). The independent variables can be continuous or categorical (dummy … Read more

What is Multinomial Logistic Regression?

Multinomial Logistic Regression is the linear regression analysis to conduct when the dependent variable is nominal with more than two levels. Thus it is an extension of logistic regression, which analyzes dichotomous (binary) dependents. Since the output of the analysis is somewhat different to the logistic regression’s output, multinomial regression is sometimes used instead. Like all linear regressions, the multinomial … Read more

What is Model Fitting ?

Model Fitting is running an algorithm to learn the relationship between predictors and outcome so that you can predict the future values of the outcome. It proceeds in three steps: First, you need a function that takes in a set of parameters and returns a predicted data set. Second you need an ‘error function’ that provides a number representing the … Read more

What is Markov Model?

Markov Model in probability theory is a stochastic model used to model randomly changing systems where it is assumed that future states depend only on the current state not on the events that occurred before it (defined as the Markov property). Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. For this reason, in … Read more

What is Manhattan Distance?

Manhattan Distance is the distance between two points measured along axes at right angles. The name hints to the grid layout of the streets of Manhattan, which causes the shortest path a car could take between two points in the city. The limitation of the Manhattan Distance heuristic is that it considers each tile independently, while in fact, tiles interfere … Read more

What is MAE (Mean Absolute Error)?

MAE – Mean Absolute Error in statistics is a quantity used to measure how close forecasts or predictions are to the eventual outcomes.The mean absolute error is an average of the absolute error where is the prediction and the true value. Note that alternative formulations may include relative frequencies as weight factors. The mean absolute error used the same scale … Read more

What is Machine Translation (MT)?

Machine Translation (MT) is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another. On a basic level, MT performs simple substitution of words in one language for words in another, but that alone usually cannot produce a good translation of a text because recognition of whole phrases and … Read more

What is Loss Function?

Loss Function in mathematical optimization, statistics, decision theory and machine learning is a function that maps an event or values of one or more variables onto a real number intuitively representing some “cost” associated with the event. An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its negative (sometimes called a … Read more

What is LOOCV or Leave-One-Out Cross Validation?

LOOCV or Leave-One-Out Cross Validation. LOOCV uses one observation from the original sample as the validation data, and the remaining observations as the training data. This is repeated such that each observation in the sample is used once as the validation data. This is the same as a K-fold cross-validation with K being equal to the number of observations in … Read more