SecretDataScientist.com

What is Q-learning?

Q-learning is a model-free reinforcement learning technique. Specifically, Q-learning can be used to find an optimal action selection policy for any given (finite) Markov decision process (MDP). It works by learning an action-value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter. A policy is a rule … Read more

What is Pruning?

Pruning is a technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide a little power to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting. One of the questions that arise in a decision tree algorithm is the optimal … Read more

What is Probabilistic Neural Network (PNN)?

Probabilistic Neural Network (PNN) is kind of feedforward neural network. In the PNN algorithm, the parent probability distribution function (PDF) of each class is approximated by a Parzen window and a non-parametric function. Then, using PDF of each class, the class probability of a new input data is estimated and Bayes’ rule is then employed to allocate the class with … Read more

What is Principal Component Analysis (PCA) ?

Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components (or sometimes, principal modes of variation). The number of principal components is less than or equal to the smaller of (number of original variables or … Read more

What is Predictive Modeling?

Predictive Modeling is a process through which a future outcome or behavior is predicted based on the past and current data at hand. It is a statistical analysis technique that enables the evaluation and calculation of the probability of certain results. Predictive modeling works by collecting data, creating a statistical model and applying probabilistic techniques to predict the likely outcome. … Read more

What is Power Analysis?

Power Analysis is an important aspect of experimental design. It allows us to determine the sample size required to detect an effect of a given size with a given degree of confidence. There are four parameters involved in a power analysis. The research must ‘know’ 3 and solve for the 4th. 1. Alpha:  Probability of finding significance where there is … Read more

What is Paired t-Test?

Paired t-Test has its purpose in the testing is to determine whether there is statistical evidence that the mean difference between paired observations on a particular outcome is significantly different from zero. The Paired-Samples t Test is a parametric test. This test is also known as Dependent t-Test. Was the above useful? Please share with others on social media. If … Read more

What is Overfitting?

Overfitting in mathematics and statistics is one of the most common tasks consisting in attempts to fit a “model” to a set of training data, so as to be able to make reliable predictions on generally untrained data. In overfitting, a statistical model describes random error or noise instead of the underlying relationship. Overfitting occurs when a model is excessively … Read more

What is Out-Of-Sample Evaluation?

Out-Of-Sample Evaluation means to withhold some of the sample data from the model identification and estimation process, then use the model to make predictions for the hold-out data in order to see how accurate they are and to determine whether the statistics of their errors are similar to those that the model made within the sample of data that was … Read more

What is Outlier?

Outlier is an observation point that is distant from other observations. An outlier may be due to variability in the measurement or it may indicate an experimental error, the latter are sometimes excluded from the data set. Outliers can occur by chance in any distribution, but they often indicate either measurement error or that the population has a heavy-tailed distribution. … Read more