Machine Learning with SQL Server & R : Part 02 – The Basic Math
We have discussed about introduction of Machine Learning in previous post “What, Why and How” now we will discuss about some basic level mathematics used in machine learning. It’s very important to understand the underlying mathematics for meaningful conclusion.
- Mean or Average.
- Deviation & Variance.
- Standard Deviation.
- Regression Analysis.
Let’s have some brief introduction of these.
Chance or likelihood of any event is called probability. It lies between 0.0 to 1.0 where 1.0 is highly likely. It can be calculated using below formula.
Suppose, we have a sack contains 12 balls ( 5 Red, 4 Black and 3 Green) and we try to calculate probability of following 4 scenarios
- What is the probability to get black from the sack?
- What is the probability to get Black AND Red. Means the ball should have both the colors?
- What is the probability to get Black OR Red. Means the ball could be either of these?
- What is the probability not to get Black?
Mean or Average:
Deviation & Variance (σ2 or s2):
“Deviation” is the distance of each data point to its mean and Variance is squired average of deviation. It tells about how data points are scattered around its mean but the clear picture will given by Standard Deviation.
Standard Deviation (σ or s) :
“Standard Deviation” is square root of the Variance. A smaller Standard Deviation means data are accumulated around its mean and bigger Standard Deviation denotes more scattered data points.
there are various ways to denote “Standard Deviation” but mostly we use small sigma (“σ”) or “S” whereas “σ” denotes “Standard Deviation of population” and “S” stands for ” standard deviation of sample”
“Regression Analysis” is a statistical method to establish relationship between the variables. Suppose you are studying about rain prediction model and you have some data like “wind speed” and “Temperature” etc. then using regression analysis you can establish relationship between the event (Rain will happen or not) and its independent variables.
There are various kind of regression analysis like “Linear Regression”, “Polynomial Regression”, “Logistic Regression” and many more according to what kind of study you are doing.
Below is an example of simple linear regression using least square method to fit a regression or prediction line. Here we are trying to predict the value of y on the behavior of value x.
We saw the simplest example and it has many assumption in it but basically it work like this. likewise, we can fit the model using R2 method where we try to reduce the square of residuals (the difference between fit value and actual value).
In following post we will see how to implement “Simple Regression”, “Multiple Regression”, “Logistic Regression” and more using “lm” and “glm” function in RTools.