The “R” language is a great free tool to analyse the data. we can do R programming using RStudio, Visual R tools and also call it from SQL Server 2016 or later version using launchpad service. you can check blog “R-Services and SQL Server 2016” for more detail.
Here we are going to see very basic statistical commands to analyse the data. Though below codes are very basic but very good to start with. It tells tons of information about the data.
But before you start, you should download and install either visual studio’s “R tools” OR RStudio. You can download open R engine from here and RStudio from here. For RTools you need to download visual studio from here and add Rtools in it.
Load Sample Data:
We are going to use built-in dataset named “mtcars” from R itself. it can be loaded using below query. we can view the data using “View()” function.
#Load mtcars data data("mtcars") #View data View(mtcars)
Summary function shows very important information about the dataset or particular column of the dataset. Here, we are trying to analyse mpg column data in the given dataset. You can check “Machine Learning with SQL Server & R : Part 02 – The Basic Math” blog for detail theory about below code.
We can clearly see it’s mean, median, max, min and different quartile stats. You will have a quick glance of the data whether it is as per your requirement or not. if it has unusual min or max value or data has very big mean then it requires some transformation to meet your requirement.
It shows the variance of your data points; which means how tightly closed your data points are. you can check “Deviation & Variance (σ2 or s2)” section of “Machine Learning with SQL Server & R : Part 02 – The Basic Math” blog post for its theory.
It shows standard deviation of data. you can check “Deviation & Variance (σ2 or s2)” section of “Machine Learning with SQL Server & R : Part 02 – The Basic Math” blog post for its theory.
Histogram gives very nice visualization of data in bins format. you can see it has categorized the data and showing it’s count.
It gives correlation between each column. it can also be plotted, we will see in later post.
It give class of selected text like here “mtcars” is a data.frame.
It shows data type of selected column or variable.
Plotting Simple Graph:
using plot command we can plot a scatter plot between two columns. it has very wide range, check it here.
plot(mtcars$mpg ~ mtcars$wt)
Here we have seen very basic commands of R language to analyse data in primary level. in later post we will see some more complex code to visualize and predict the output.