What is Chi-squared test for goodness of fit?

Chi-squared test for goodness of fit also written as a χ2 test is any statistical hypothesis test wherein the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. Without other qualification, ‘chi-squared test’ often is used as short for Pearson’s chi-squared test. Chi-squared tests are often constructed from a sum of squared errors, … Read more

What is Central Limit Theorem?

Central Limit Theorem (CLT) is a statistical theory that states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, all of the samples will follow an approximately normal distribution pattern, with all variances … Read more

What is Causation?

Causation. Two or more variables considered to be related, in a statistical context, if their values change so that as the value of one variable increases or decreases so does the value of the other variable (although it may be in the opposite direction). Theoretically, the difference between the two types of relationships is easy to identify — an action … Read more

What is Categorical Variable?

Categorical Variable in statistics is a variable that can take on one of a limited, and usually fixed number of possible values, assigning each unit of observation to a particular group or nominal category on the basis of some qualitative property. In computer science and some branches of mathematics, categorical variables are referred to as enumerations or enumerated types. Commonly, … Read more

What is CART or Classification And Regression Trees?

CART or Classification And Regression Trees are machine-learning methods for constructing prediction models from data. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of … Read more

What is Box plot?

Box plots is a quick way of examining one or more sets of data graphically. In statistics, a box plot is a convenient way of depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, which brings up the terms box-and-whisker plot … Read more

What is Bootstrapping?

Bootstrapping. In statistics, bootstrapping is any test or metric that relies on random sampling with replacement. Bootstrapping allows assigning measures of accuracy (defined in terms of bias, variance, confidence intervals, prediction error or some other such measure) to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods. Generally, it falls in … Read more

What is Boltzmann Machine?

Boltzmann machine is a network of symmetrically connected, neuronlike units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm that allows them to discover interesting features in datasets composed of binary vectors. The learning algorithm is very slow in networks with many layers of feature detectors, but it can be made … Read more

What is Big Data?

Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy. The term “big data” often refers simply to the use of predictive analytics, user behavior analytics, or certain … Read more

What is Bias-variance trade-off

Bias-variance trade-off is a central problem in supervised learning. Ideally, one wants to choose a model that both accurately captures the regularities in its training data, but also generalizes well to unseen data. In statistics and machine learning bias-variance trade-off is the problem of simultaneously minimizing two sources of error that prevent supervised learning algorithms from generalizing beyond their training … Read more