Back

Fine Tuning LLM

Fine-tuning large language models (LLMs) has become an indispensable tool in the LLM requirements of enterprises to enhance their operational processes. While the foundational training of LLMs offers a broad understanding of language, the fine-tuning process molds these models into specialized tools capable of understanding niche topics and delivering more…

Embeddings

Embeddings are a fundamental concept in machine learning and natural language processing (NLP). They are used to convert non-numeric data, such as text or categorical variables, into numerical vectors that machine learning algorithms can process. These vectors, known as embeddings, capture the semantic meaning and relationships between different pieces of…

LangChain Cheatsheet

LangChain simplifies building AI applications using large language models (LLMs) by providing an intuitive interface for connecting to state-of-the-art models like GPT-4 and optimizing them for custom applications. It supports chains combining multiple models and modular prompt engineering for more impactful interactions. Key Features Code Snippets 1. Creating a Custom…

Ollama Cheatsheet

Here is a comprehensive Ollama cheat sheet containing most often used commands and explanations: Installation and Setup Running Ollama Model Library and Management Advanced Usage Integration with Visual Studio Code AI Developer Scripts Additional Resources Other Tools and Integrations Community and Support Documentation and Updates Additional Tips Additional References Additional…

Autonomous AI Agents

Autonomous AI agents are intelligent computer programs that operate independently, making decisions and taking actions without human intervention. These agents are powered by advanced machine learning algorithms and large language models (LLMs), enabling them to process vast amounts of data and perform complex tasks with remarkable accuracy and speed. In…

What is AGI – Artificial General Intelligence?

Artificial General Intelligence (AGI): A Comprehensive Overview for Professionals Artificial General Intelligence (AGI) is a concept that has garnered significant attention in recent years, particularly with the emergence of advanced AI tools like ChatGPT. As a researcher in the field, it is essential to understand the nuances of AGI and…

Trading with Python Intro – Data Import

Traditionally, there have been two general ways of analyzing market data: In recent years, computer science and mathematics revolutionized trading, it has become dominated by computers helping to analyze vast amounts of available data.  Algorithms are responsible for making trading decisions faster than any human being could. Machine learning and…

Data Scientist Interview Questions – Explain what precision and recall are?

After the predictive model has been finished, the most important question is: How good is it? Does it predict well? Evaluating the model is one of the most important tasks in the data science project,  it indicates how good predictions are. Very often for classification problems we look at metrics called…


DATA SCIENCE QUESTIONS AND ANSWERS


What is Unsupervised Learning?

Unsupervised Learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labelled responses. The most common unsupervised learning method is cluster…

What is Type II Error?

Type II Error in statistical hypothesis testing is incorrectly retaining a false null hypothesis (a “false negative”). A type II error (or error of the second kind) is the failure…

What is Type I Error?

Type I Error in statistical hypothesis testing is the incorrect rejection of a true null hypothesis (a false positive). More simply stated, a type I error is detecting an effect…

What is True Positive Rate (Sensitivity)?

True Positive Rate (Sensitivity) is a statistical measure which measures the proportion of positives that are correctly identified as such (for example, the percentage of sick people who are correctly…

What is True Negative Rate (Specificity)?

True Negative Rate (Specificity) is a statistical measure which measures the proportion of negatives that are correctly identified as such (for example, the percentage of healthy people who are correctly…

What is Three Sigma Rule?

Three Sigma Rule in the empirical sciences express a conventional heuristic that “nearly all” values are taken to lie within three standard deviations of the mean, i.e. that it is…

What is Support Vector Machines (SVM)?

Support Vector Machines (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which…

What is Supervised Learning?

Supervised Learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example…

What is Statistical Significance?

Statistical Significance in statistical hypothesis testing is attained whenever the observed p-value of a test statistic is less than the significance level defined for the study. The p-value is the…

What is Statistical Power?

Statistical Power of any test of statistical significance is defined as the probability that it will reject a false null hypothesis. Statistical power is inversely related to beta or the…

What is Sentiment Analysis?

Sentiment Analysis refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis…

What is Semi-Supervised Learning?

Semi-Supervised Learning is a class of supervised learning tasks that also make use of unlabeled data for training – typically a small amount of labeled data with a large amount…

What is Self-Organizing Map (SOM)?

Self-Organizing Map (SOM) is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the…

What is Selection Bias?

Selection Bias is the selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved, thereby ensuring that the sample obtained is not…

What is R-squared?

R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple…

What is Root Mean Square Error (RMSE)?

Root Mean Square Error (RMSE) is a frequently used measure of the differences between values (sample and population values) predicted by a model or an estimator and the values actually…

What is Resampling?

Resampling is any technique of generating a new sample from an existing dataset. There is a variety of methods for estimating the precision of sample statistics (medians, variances, percentiles) by…

What is Regularization?

Regularization in the field of machine learning is a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting. A theoretical justification for regularization…

What is Regression?

Regression is a statistical measure used that attempts to determine the strength of the relationship between one dependent variable and a series of other changing (independent) variables. The two basic…

What is Random Sampling?

Random sampling. In this technique, each member of the population has an equal chance of being selected as the subject. The entire process of sampling is done in a single…

What is Random Forest?

Random Forest or Random Decision Forest are an ensemble learning method for classification, regression, and other tasks, that operate by constructing a multitude of decision trees at training time and…

What is Radial Basis Function(RBF) network?

Radial Basis Function(RBF) network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions…

What is QQ plot?

QQ plots – Quantile-Quantile plots are a graphical technique for determining if two data sets come from populations with a common distribution. A q-q plot is a plot of the…

What is Q-learning?

Q-learning is a model-free reinforcement learning technique. Specifically, Q-learning can be used to find an optimal action selection policy for any given (finite) Markov decision process (MDP). It works by…

What is Pruning?

Pruning is a technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide a little power to classify instances. Pruning reduces…

What is Probabilistic Neural Network (PNN)?

Probabilistic Neural Network (PNN) is kind of feedforward neural network. In the PNN algorithm, the parent probability distribution function (PDF) of each class is approximated by a Parzen window and…

What is Principal Component Analysis (PCA) ?

Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly…

What is Predictive Modeling?

Predictive Modeling is a process through which a future outcome or behavior is predicted based on the past and current data at hand. It is a statistical analysis technique that…

What is Power Analysis?

Power Analysis is an important aspect of experimental design. It allows us to determine the sample size required to detect an effect of a given size with a given degree…