DATA SCIENCE Q&A - Page 5 of 12 - SecretDataScientist.com

What is MAE (Mean Absolute Error)?

MAE – Mean Absolute Error in statistics is a quantity used to measure how close forecasts or predictions are to the eventual outcomes.The mean absolute error is an average of the absolute error where is the prediction and the true value. Note that alternative formulations may include relative frequencies as weight factors. The mean absolute error used the same scale … Read more

What is Machine Translation (MT)?

Machine Translation (MT) is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another. On a basic level, MT performs simple substitution of words in one language for words in another, but that alone usually cannot produce a good translation of a text because recognition of whole phrases and … Read more

What is Loss Function?

Loss Function in mathematical optimization, statistics, decision theory and machine learning is a function that maps an event or values of one or more variables onto a real number intuitively representing some “cost” associated with the event. An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its negative (sometimes called a … Read more

What is LOOCV or Leave-One-Out Cross Validation?

LOOCV or Leave-One-Out Cross Validation. LOOCV uses one observation from the original sample as the validation data, and the remaining observations as the training data. This is repeated such that each observation in the sample is used once as the validation data. This is the same as a K-fold cross-validation with K being equal to the number of observations in … Read more

What is Long-Tailed Distribution?

Long-Tailed Distribution in statistics and business is the portion of the distribution having a large number of occurrences far from the “head” or central part of the distribution. The term is often used loosely, with no definition or arbitrary definition, but precise definitions are possible. Broadly speaking, for such population distributions, the majority of occurrences (more than half, and where … Read more

What is Long Short-Term Memory(LSTM) in machine learning?

Long Short-Term Memory usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is their default behavior. All recurrent neural networks have the form of a chain of repeating modules of a neural network. In standard RNNs, … Read more

What is Log-Normal Distribution?

Log-Normal Distribution in probability theory is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable is log-normally distributed, then has a normal distribution. Likewise, if Y has a normal distribution, then X=exp(y) has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. The distribution is … Read more

What is Logistic Regression?

Logistic Regression in statistics is a regression model where the dependent variable is categorical. For example the case of a binary dependent variable—that is, where it can take only two values, “0” and “1”, which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick. Cases where the dependent variable has more than two outcome categories may be analysed in multinomial … Read more

What is Log Loss?

Log Loss measures the performance of a classification model where the prediction input is a probability value between 0 and 1. The goal of our machine learning models is to minimize this value. A perfect model would have a log loss of 0. Log loss increases as the predicted probability diverge from the actual label. So predicting a probability of … Read more

What is Linear Regression ?

Linear Regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression. (This term is distinct from the multivariate linear regression, where … Read more