Machine Learning with SQL Server & R : Part 02 – The Basic Math

Hello Friends!!!

We have discussed about introduction of Machine Learning in previous post  “What, Why and How”  now we will discuss about some basic level mathematics used in machine learning. It’s very important to understand the underlying mathematics for meaningful conclusion.

Following topics used by ML either directly or indirectly. its very basic but plays very deep role in Machine Learning.

 

  • Probability.
  • Mean or Average.
  • Deviation & Variance.
  • Standard Deviation.
  • Regression Analysis.

Let’s have some brief introduction of these.

Probability:

Chance or likelihood of any event is called probability. It lies between 0.0 to 1.0 where 1.0 is highly likely. It can be calculated using below formula.

Suppose, we have a sack contains 12 balls ( 5 Red, 4 Black and 3 Green) and we try to calculate probability of following 4 scenarios

  1. What is the probability to get black from the sack?
  2. What is the probability to get Black AND Red. Means the ball should have both the colors?
  3. What is the probability to get Black OR Red. Means the ball could be either of these?
  4. What is the probability not to get Black?

 

 

Mean or Average:

It is the middle value of the data points. It gives a very general idea about the data and can be calculated using below formula    

 

 

 

Deviation & Variance (σ2 or s2):

“Deviation” is the distance of each data point to its mean and Variance is squired average of deviation. It tells about how data points are scattered around its mean but the clear picture will given by Standard Deviation.

Standard Deviation (σ or s) :

“Standard Deviation” is square root of the Variance. A smaller Standard Deviation means data are accumulated around its mean and bigger Standard Deviation denotes more scattered data points.

there are various ways to denote “Standard Deviation” but mostly we use small sigma (“σ”) or “S” whereas “σ” denotes “Standard Deviation of population” and “S” stands for ” standard deviation of sample

Regression Analysis:

“Regression Analysis” is a statistical method to establish relationship between the variables. Suppose you are studying about rain prediction model and you have some data like “wind speed” and “Temperature” etc. then using regression analysis you can establish relationship between the event (Rain will happen or not) and its independent variables.

There are various kind of regression analysis like “Linear Regression”, “Polynomial Regression”, “Logistic Regression” and many more according to what kind of study you are doing.

Below is an example of simple linear regression using least square method to fit a regression or prediction line. Here we are trying to predict the value of y on the behavior of value x.

We saw the simplest example and it has many assumption in it but basically it work like this. likewise, we can fit the model using R2 method where we try to reduce the square of residuals (the difference between fit value and actual value).

In following post we will see how to implement “Simple Regression”, “Multiple Regression”, “Logistic Regression” and more using “lm” and “glm” function in RTools.

 

Related posts:

Machine Learning with SQL Server & R : Part 01 – What, Why and How

 

 

Thanks!!!

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s