## Machine Learning with SQL Server & R : Part 02 – The Basic Math

Hello Friends!!!

We have discussed about introduction of Machine Learning in previous post “What, Why and How” now we will discuss about some basic level mathematics used in machine learning. It’s very important to understand the underlying mathematics for meaningful conclusion.

Following topics used by ML either directly or indirectly. its very basic but plays very deep role in Machine Learning.

**Probability.****Mean or Average.****Deviation & Variance.****Standard Deviation.****Regression Analysis.**

Let’s have some brief introduction of these.

**Probability:**

Chance or likelihood of any event is called probability. It lies between 0.0 to 1.0 where 1.0 is highly likely. It can be calculated using below formula.

Suppose, we have a sack contains 12 balls ( 5 Red, 4 Black and 3 Green) and we try to calculate probability of following 4 scenarios

- What is the probability to get black from the sack?
- What is the probability to get Black
**AND**Red. Means the ball should have both the colors? - What is the probability to get Black
**OR**Red. Means the ball could be either of these? - What is the probability not to get Black?

**Mean or Average:**

It is the middle value of the data points. It gives a very general idea about the data and can be calculated using below formula** **

**Deviation & Variance (σ2 or s2):**

“Deviation” is the distance of each data point to its mean and Variance is squired average of deviation. It tells about how data points are scattered around its mean but the clear picture will given by Standard Deviation.

**Standard Deviation (σ or s) :**

“Standard Deviation” is square root of the Variance. A smaller Standard Deviation means data are accumulated around its mean and bigger Standard Deviation denotes more scattered data points.

there are various ways to denote “Standard Deviation” but mostly we use small sigma (“σ”) or “S” whereas “**σ**” denotes “Standard Deviation of **population**” and “**S**” stands for ” standard deviation of **sample**”

**Regression Analysis:**

“Regression Analysis” is a statistical method to establish relationship between the variables. Suppose you are studying about rain prediction model and you have some data like “wind speed” and “Temperature” etc. then using regression analysis you can establish relationship between the event (Rain will happen or not) and its independent variables.

There are various kind of regression analysis like “Linear Regression”, “Polynomial Regression”, “Logistic Regression” and many more according to what kind of study you are doing.

Below is an example of **simple linear regression **using **least square method** to fit a regression or prediction line. Here we are trying to predict the value of **y** on the behavior of value **x**.

We saw the simplest example and it has many assumption in it but basically it work like this. likewise, we can fit the model using R^{2 }method where we try to reduce the square of residuals (the difference between fit value and actual value).

In following post we will see how to implement “Simple Regression”, “Multiple Regression”, “Logistic Regression” and more using “lm” and “glm” function in RTools.

Related posts:

Machine Learning with SQL Server & R : Part 01 – What, Why and How

Thanks!!!