DATA SCIENCE Q&A - Page 6 of 12 - SecretDataScientist.com

What are Linear Classifiers ?

Linear Classifiers use object’s characteristics to predict which class (or group) it belongs to. It achieves this by making a classification decision based on the value of a linear combination of the characteristics. An object’s characteristics are also known as feature values and are typically presented to the machine in a vector called a feature vector. Such classifiers work well … Read more

What is Lazy Learning in machine learning?

Lazy Learning in machine learning is a learning method in which generalization beyond the training data is delayed until a query is made to the system, as opposed to in eager learning, where the system tries to generalize the training data before receiving queries. Lazy learning is essentially an instance-based learning: it simply stores training data (or only minor processing) … Read more

What is Law of Large Numbers ?

Law of Large Numbers is a principle of probability according to which the frequencies of events with the same likelihood of occurrence even out, given enough trials or instances. As the number of experiments increases, the actual ratio of outcomes will converge on the theoretical, or expected, a ratio of outcomes. For example, if a fair coin is tossed 1,000,000 … Read more

What is Latent Semantic Indexing (LSI)?

Latent Semantic Indexing (LSI) is a mathematical method used to determine the relationship between terms and concepts in content. The contents of a web page are crawled by a search engine and the most common words and phrases are collated and identified as the keywords for the page. LSI looks for synonyms related to the title of your page. For … Read more

What is Lasso (Least Absolute Shrinkage And Selection Operator) ?

Lasso (Least Absolute Shrinkage And Selection Operator) in statistics and machine learning is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. Lasso was originally formulated for least squares models and this simple case reveals a substantial amount about the behaviour of the … Read more

What is Kolmogorov-Smirnov test?

Kolmogorov-Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test). The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative … Read more

What is K-Nearest Neighbour (KNN)?

K-Nearest Neighbour (KNN) in pattern recognition is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression. In k-NN classification, the output is a class membership. An object is classified by a majority vote … Read more

What is K-means Clustering?

K-means Clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into … Read more

What is K-means Algorithm in machine learning?

K-means Algorithm is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in … Read more

What is Kernel Trick ?

Kernel Trick is an approach consisting in the use of kernel functions, operating in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation … Read more