Machine Learning with SQL Server & R : Part 03 – The R Language Basics

The “R” language is a great free tool to analyse the data. we can do R programming using RStudio, Visual R tools and also call it from SQL Server 2016 or later version using launchpad service. you can check blog “R-Services and SQL Server 2016” for more detail.

Here we are going to see very basic statistical commands to analyse the data. Though below codes are very basic but very good to start with. It tells tons of information about the data.

But before you start, you should download and install either visual studio’s “R tools” OR RStudio. You can download open R engine  from here and RStudio from here. For RTools you need to download visual studio from here and add Rtools in it.

Load Sample Data:

We are going to use built-in dataset named “mtcars” from R itself. it can be loaded using below query. we can view the data using “View()” function.

#Load mtcars data

#View data


Summary function shows very important information about the dataset or particular column of the dataset. Here, we are trying to analyse mpg column data in the given dataset. You can check “Machine Learning with SQL Server & R : Part 02 – The Basic Math” blog for detail theory about below code.

We can clearly see it’s mean, median, max, min and different quartile stats. You will have a quick glance of the data whether it is as per your requirement or not. if it has unusual min or max value or data has very big mean then it requires some transformation to meet your requirement.



It shows the variance of your data points; which means how tightly closed your data points are. you can check “Deviation & Variance (σ2 or s2)” section of “Machine Learning with SQL Server & R : Part 02 – The Basic Math” blog post for its theory.


Standard Deviation:

It shows standard deviation of data. you can check “Deviation & Variance (σ2 or s2)” section of “Machine Learning with SQL Server & R : Part 02 – The Basic Math” blog post for its theory.



Histogram gives very nice visualization of data in bins format. you can see it has categorized the data and showing it’s count.



It gives correlation between each column. it can also be plotted, we will see in later post.



It give class of selected text like here “mtcars” is a data.frame.



It shows data type of selected column or variable.


Plotting Simple Graph:

using plot  command we can plot a scatter plot between two columns. it has very wide range, check it here.

 plot(mtcars$mpg ~ mtcars$wt)



Here we have seen very basic commands of R language to analyse data in primary level. in later post we will see some more complex code to visualize and predict the output.


Thanks 🙂

#data-science, #ml, #r, #r-in-sql-server-2016, #regression-analysis, #machine-learning

Machine Learning with SQL Server & R : Part 02 – The Basic Math

Hello Friends!!!

We have discussed about introduction of Machine Learning in previous post  “What, Why and How”  now we will discuss about some basic level mathematics used in machine learning. It’s very important to understand the underlying mathematics for meaningful conclusion.

Following topics used by ML either directly or indirectly. its very basic but plays very deep role in Machine Learning.


  • Probability.
  • Mean or Average.
  • Deviation & Variance.
  • Standard Deviation.
  • Regression Analysis.

Let’s have some brief introduction of these.


Chance or likelihood of any event is called probability. It lies between 0.0 to 1.0 where 1.0 is highly likely. It can be calculated using below formula.

Suppose, we have a sack contains 12 balls ( 5 Red, 4 Black and 3 Green) and we try to calculate probability of following 4 scenarios

  1. What is the probability to get black from the sack?
  2. What is the probability to get Black AND Red. Means the ball should have both the colors?
  3. What is the probability to get Black OR Red. Means the ball could be either of these?
  4. What is the probability not to get Black?



Mean or Average:

It is the middle value of the data points. It gives a very general idea about the data and can be calculated using below formula    




Deviation & Variance (σ2 or s2):

“Deviation” is the distance of each data point to its mean and Variance is squired average of deviation. It tells about how data points are scattered around its mean but the clear picture will given by Standard Deviation.

Standard Deviation (σ or s) :

“Standard Deviation” is square root of the Variance. A smaller Standard Deviation means data are accumulated around its mean and bigger Standard Deviation denotes more scattered data points.

there are various ways to denote “Standard Deviation” but mostly we use small sigma (“σ”) or “S” whereas “σ” denotes “Standard Deviation of population” and “S” stands for ” standard deviation of sample

Regression Analysis:

“Regression Analysis” is a statistical method to establish relationship between the variables. Suppose you are studying about rain prediction model and you have some data like “wind speed” and “Temperature” etc. then using regression analysis you can establish relationship between the event (Rain will happen or not) and its independent variables.

There are various kind of regression analysis like “Linear Regression”, “Polynomial Regression”, “Logistic Regression” and many more according to what kind of study you are doing.

Below is an example of simple linear regression using least square method to fit a regression or prediction line. Here we are trying to predict the value of y on the behavior of value x.

We saw the simplest example and it has many assumption in it but basically it work like this. likewise, we can fit the model using R2 method where we try to reduce the square of residuals (the difference between fit value and actual value).

In following post we will see how to implement “Simple Regression”, “Multiple Regression”, “Logistic Regression” and more using “lm” and “glm” function in RTools.


Related posts:

Machine Learning with SQL Server & R : Part 01 – What, Why and How







#data-science, #ml, #r, #r-in-sql-server-2016, #regression-analysis, #machine-learning

Machine Learning with SQL Server & R : Part 01 – What, Why and How

Hello Friends!!!

Machine Learning (ML) is very fascinating nowadays. But what is ML? why it is required? how a machine can learn? what kind of output it will provide? these kind of questions comes in our mind when we think about Machine Learning, right!!!

In this blog post we are going to discuss very briefly about Machine Learning and in later posts we will see about its application and coding. This series will help those individuals who has just started OR planning to start understanding Machine Learning Technology and it will be a good refresher for those who are already working in it.

What: What is Machine Learning?

Everyone of us are good at something like riding a bike, playing some game, cooking etc.; correct!

Now, hold-on and think, why you are good at those things? Because you know it very well. You know what could happen and what kind of reaction you supposed to do for any consequences.

Imagine, a kid learning to walk. he falls initially but as time passes, he walks very well. And the reason is very obvious that he learnt how to walk OR in other words he learnt how not to fall. This is called “Learning” from your past experience (or “Data” in computer science) and match with it’s current output and keep doing by itself. Now, just replace the kid with a small robot and somehow program it to learn from it’s past data and apply to the next occurrence, it will react same as the kid. THAT IS MACHINE LEARNING.

Machine Learning is a technique which enables a program to learn from available data and it’s future occurrences and act according to that. AI (Artificial Intelligence), ML (Machine Learning), NLP (Natural Language Processing), (NN) Neural Network and Deep Learning; are branches of Data Science with marginal differences in terms of its input & output complexity.

Example: When your mail service provider separates junk mails for you, when you see sales advertisement on your screen as per your last search and many more are very good example of machine learning algorithms.

Machine Learning broadly divided into two categories like supervised learning and unsupervised learning.

Supervised Learning: In supervised learning the output is desirable like whether any occurrence will happen or not OR what is the likelihood of any occurrence. Classification problems  and  regression problems are very good example of supervised learning.

Unsupervised Learning: In unsupervised learning the output is not desirable. Clustering problems and association problems are good example of unsupervised learning.

Why: Why ML is trending now?

Now, you  might be having the idea that ML is a “Data” hunger program. if no data then no machine learning ( except it is not comes under unsupervised learning). As we have a huge data assets with us in form of structured data ( form transaction systems, warehouses etc.) or unstructured data ( from social media, online shopping platform etc.) so now business houses, politicians, banks  and others wants see their data and convert into some meaningful output for better decision making. It becomes very favorable condition to use ML or AI services to make better decisions.

How: How we can use ML as a tool?

There are many tools available to explore data using Machine Learning like R, Python, Matlab etc. but we are going to look “Machine Learning Services” of SQL SERVER 2017 (R). By using SQL Server 2016 and above machine learning services we can process the entire data set in the model. We are going to use RTool in Visual Studio 2017. You can refer R-Services in SQL SERVER 2016 post for its introduction part.

#data-science, #ml, #r, #r-in-sql-server-2016, #regression-analysis, #machine-learning