# Machine Learning with SQL Server & R : Part 03 – The R Language Basics

The “R” language is a great free tool to analyse the data. we can do R programming using RStudio, Visual R tools and also call it from SQL Server 2016 or later version using launchpad service. you can check blog “R-Services and SQL Server 2016” for more detail.

Here we are going to see very basic statistical commands to analyse the data. Though below codes are very basic but very good to start with. It tells tons of information about the data.

But before you start, you should download and install either visual studio’s “R tools” OR RStudio. You can download open R engine  from here and RStudio from here. For RTools you need to download visual studio from here and add Rtools in it.

We are going to use built-in dataset named “mtcars” from R itself. it can be loaded using below query. we can view the data using “View()” function.

```
data("mtcars")

#View data
View(mtcars)
```

### Summary:

Summary function shows very important information about the dataset or particular column of the dataset. Here, we are trying to analyse mpg column data in the given dataset. You can check “Machine Learning with SQL Server & R : Part 02 – The Basic Math” blog for detail theory about below code.

We can clearly see it’s mean, median, max, min and different quartile stats. You will have a quick glance of the data whether it is as per your requirement or not. if it has unusual min or max value or data has very big mean then it requires some transformation to meet your requirement.

```
summary(mtcars\$mpg)
```

### Variance:

It shows the variance of your data points; which means how tightly closed your data points are. you can check “Deviation & Variance (σ2 or s2)” section of “Machine Learning with SQL Server & R : Part 02 – The Basic Math” blog post for its theory.

```

var(mtcars\$mpg)

```

### Standard Deviation:

It shows standard deviation of data. you can check “Deviation & Variance (σ2 or s2)” section of “Machine Learning with SQL Server & R : Part 02 – The Basic Math” blog post for its theory.

```
sd(mtcars\$mpg)
```

### Histogram:

Histogram gives very nice visualization of data in bins format. you can see it has categorized the data and showing it’s count.

```
hist(mtcars\$mpg)
```

### Correlation:

It gives correlation between each column. it can also be plotted, we will see in later post.

```
cor(mtcars)
```

### Class:

It give class of selected text like here “mtcars” is a data.frame.

```
class(mtcars)
```

### Typeof:

It shows data type of selected column or variable.

```
typeof(mtcars\$mpg)
```

### Plotting Simple Graph:

using plot  command we can plot a scatter plot between two columns. it has very wide range, check it here.

```
plot(mtcars\$mpg ~ mtcars\$wt)
```

Here we have seen very basic commands of R language to analyse data in primary level. in later post we will see some more complex code to visualize and predict the output.

Thanks 🙂

# Power BI: How to open multi layer JSON file into Power BI

Hello Friends,

Below are the steps to open multi layer JSON file into Power BI.

Note:  if you open from files system then use JSON file datasource or if you open from any url then you can use WEB and rest of the steps will be the same.

Step 01: Open “Get Data” and click “More…

Step 02: Select File>>JSON files option.

Step 03: Select file.

Step 04:  You will see document levels and “Record” link which holds data in it. Go to second level’s “Record” section and right click and select “Drill Down“.

Step 05: Right click on first “Record”  and click on “Into Table“.

Step 06: Now you will able to see your data just you need to expand the columns by clicking button on “Value” column. {I used to clear “use original column name as prefix” option as it adds prefix before you column’s name}

Step 07:  Click OK and you will able to you final dataset.

# Machine Learning with SQL Server & R : Part 02 – The Basic Math

Hello Friends!!!

We have discussed about introduction of Machine Learning in previous post  “What, Why and How”  now we will discuss about some basic level mathematics used in machine learning. It’s very important to understand the underlying mathematics for meaningful conclusion.

Following topics used by ML either directly or indirectly. its very basic but plays very deep role in Machine Learning.

• Probability.
• Mean or Average.
• Deviation & Variance.
• Standard Deviation.
• Regression Analysis.

Let’s have some brief introduction of these.

### Probability:

Chance or likelihood of any event is called probability. It lies between 0.0 to 1.0 where 1.0 is highly likely. It can be calculated using below formula.

Suppose, we have a sack contains 12 balls ( 5 Red, 4 Black and 3 Green) and we try to calculate probability of following 4 scenarios

1. What is the probability to get black from the sack?
2. What is the probability to get Black AND Red. Means the ball should have both the colors?
3. What is the probability to get Black OR Red. Means the ball could be either of these?
4. What is the probability not to get Black?

### Mean or Average:

It is the middle value of the data points. It gives a very general idea about the data and can be calculated using below formula

### Deviation & Variance (σ2 or s2):

“Deviation” is the distance of each data point to its mean and Variance is squired average of deviation. It tells about how data points are scattered around its mean but the clear picture will given by Standard Deviation.

### Standard Deviation (σ or s) :

“Standard Deviation” is square root of the Variance. A smaller Standard Deviation means data are accumulated around its mean and bigger Standard Deviation denotes more scattered data points.

there are various ways to denote “Standard Deviation” but mostly we use small sigma (“σ”) or “S” whereas “σ” denotes “Standard Deviation of population” and “S” stands for ” standard deviation of sample

### Regression Analysis:

“Regression Analysis” is a statistical method to establish relationship between the variables. Suppose you are studying about rain prediction model and you have some data like “wind speed” and “Temperature” etc. then using regression analysis you can establish relationship between the event (Rain will happen or not) and its independent variables.

There are various kind of regression analysis like “Linear Regression”, “Polynomial Regression”, “Logistic Regression” and many more according to what kind of study you are doing.

Below is an example of simple linear regression using least square method to fit a regression or prediction line. Here we are trying to predict the value of y on the behavior of value x.

We saw the simplest example and it has many assumption in it but basically it work like this. likewise, we can fit the model using R2 method where we try to reduce the square of residuals (the difference between fit value and actual value).

In following post we will see how to implement “Simple Regression”, “Multiple Regression”, “Logistic Regression” and more using “lm” and “glm” function in RTools.

Related posts:

Machine Learning with SQL Server & R : Part 01 – What, Why and How

Thanks!!!

# Machine Learning with SQL Server & R : Part 01 – What, Why and How

Hello Friends!!!

Machine Learning (ML) is very fascinating nowadays. But what is ML? why it is required? how a machine can learn? what kind of output it will provide? these kind of questions comes in our mind when we think about Machine Learning, right!!!

In this blog post we are going to discuss very briefly about Machine Learning and in later posts we will see about its application and coding. This series will help those individuals who has just started OR planning to start understanding Machine Learning Technology and it will be a good refresher for those who are already working in it.

### What: What is Machine Learning?

Everyone of us are good at something like riding a bike, playing some game, cooking etc.; correct!

Now, hold-on and think, why you are good at those things? Because you know it very well. You know what could happen and what kind of reaction you supposed to do for any consequences.

Imagine, a kid learning to walk. he falls initially but as time passes, he walks very well. And the reason is very obvious that he learnt how to walk OR in other words he learnt how not to fall. This is called “Learning” from your past experience (or “Data” in computer science) and match with it’s current output and keep doing by itself. Now, just replace the kid with a small robot and somehow program it to learn from it’s past data and apply to the next occurrence, it will react same as the kid. THAT IS MACHINE LEARNING.

Machine Learning is a technique which enables a program to learn from available data and it’s future occurrences and act according to that. AI (Artificial Intelligence), ML (Machine Learning), NLP (Natural Language Processing), (NN) Neural Network and Deep Learning; are branches of Data Science with marginal differences in terms of its input & output complexity.

Example: When your mail service provider separates junk mails for you, when you see sales advertisement on your screen as per your last search and many more are very good example of machine learning algorithms.

Machine Learning broadly divided into two categories like supervised learning and unsupervised learning.

Supervised Learning: In supervised learning the output is desirable like whether any occurrence will happen or not OR what is the likelihood of any occurrence. Classification problems  and  regression problems are very good example of supervised learning.

Unsupervised Learning: In unsupervised learning the output is not desirable. Clustering problems and association problems are good example of unsupervised learning.

### Why: Why ML is trending now?

Now, you  might be having the idea that ML is a “Data” hunger program. if no data then no machine learning ( except it is not comes under unsupervised learning). As we have a huge data assets with us in form of structured data ( form transaction systems, warehouses etc.) or unstructured data ( from social media, online shopping platform etc.) so now business houses, politicians, banks  and others wants see their data and convert into some meaningful output for better decision making. It becomes very favorable condition to use ML or AI services to make better decisions.

### How: How we can use ML as a tool?

There are many tools available to explore data using Machine Learning like R, Python, Matlab etc. but we are going to look “Machine Learning Services” of SQL SERVER 2017 (R). By using SQL Server 2016 and above machine learning services we can process the entire data set in the model. We are going to use RTool in Visual Studio 2017. You can refer R-Services in SQL SERVER 2016 post for its introduction part.

# Temporal table in SQL Server 2016

Hello Friends,

System-versioned temporal table OR temporal table is a new type of user table with data history. It maintains historical data like SCD-2 (Slowly Changing Dimension type2) .

This concept has introduced in ANSI SQL 2011 and incorporated by MS SQL Server in SQL Server 2016 onward. It’s a very great feature for auditing purpose.

### Facts of Temporal Table:

Below mentioned features are tested and verified.

1. It has two parts system-versioned temporal table and History table.
2. Temporal table maintains only data level history.
3. PRIMARY KEY is compulsory while defining system versioning.
4. History table is read only.
5. We can create Indexes/statistics in history table for better query performance.
6. We cannot add column, alter datatype, add trigger, add constraint, add check constraint etc. in history table. More precisely, anything which requires schema changes or data restrictions are not allowed.
7. It requires 2 additional columns with datetime2 datatype to store time-stamps (UTC date-time). Also, it doesn’t affect any DML operation as it is hidden for such statements.
8. If user defined history table name already exists then it will become history table with old data.
9. If we add column in existing temporal table then it will maintain history for it without any additional setting.
10. We can drop column from temporal table even if it has SYSTEM_VERSIONING = ON and it loses the history for that column as well.

### How to create Temporal Table:

Using below CREATE TABLE statement, we can create temporal table.

```CREATE TABLE dbo.Test_TemporalTable
(
ID INT IDENTITY PRIMARY KEY
,FName VARCHAR(50)
,FromDateTime DATETIME2 GENERATED ALWAYS AS ROW START
,EndDateTime DATETIME2 GENERATED ALWAYS AS ROW END
,PERIOD FOR SYSTEM_TIME (FromDateTime,EndDateTime)
)
WITH (SYSTEM_VERSIONING = ON)
```

FromDateTime: Data history start date time.

EndDateTime : Data history end datetime.

SYSTEM_VERSIONG = ON: signifies that system versioning has enabled.

You can see on Object Explorer; “system-versioned” tag with temporal table. Which defines it as temporal table.

### Where is the history table?

You can find the history table just beneath the temporal table in object explore with “History” tag. Also it has similar structure as temporal table but without primary key. Above CREATE TABLE statement has no user -defined name for history table so it will be create with prefix MSSQL_ and suffix object_id. In case, if the auto generated history table already exists then SQL Server will add a random number after it.

See below snap:

### Insert/Update/Delete on system-versioned temporal table:

We can perform Insert/Update/Delete as like normal table. The additional columns are hidden for such operations.

In below SQL, we will create a temporal table with user defined History table and Insert/update/delete data on it.

```-- Creating system-versioned table
GO
CREATE TABLE dbo.Test_TemporalTable
(
ID INT IDENTITY PRIMARY KEY
,FName VARCHAR(50)
,FromDateTime DATETIME2 GENERATED ALWAYS AS ROW START
,EndDateTime DATETIME2 GENERATED ALWAYS AS ROW END
,PERIOD FOR SYSTEM_TIME (FromDateTime,EndDateTime)
)

-- Enable system versioning
ALTER TABLE Test_TemporalTable SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.Test_TemporalTable_History))

-- Inserting test data into temporal table

INSERT INTO Test_TemporalTable (FName) VALUES ('Data_01'),('Data_02')

-- Currently no data into History table
SELECT *
FROM Test_TemporalTable

SELECT *
FROM [dbo].[Test_TemporalTable_History]

```

After inserting data into temporal table; we can find data only in temporal table whereas History table has no data in it because it has no change till now.

Now, let’s perform some DML operation in temporal table

```-- DML operation
-- Update ID=1
UPDATE Test_TemporalTable
SET FName = 'Data_01_Updated'
WHERE ID = 1

-- Delete ID=2
DELETE Test_TemporalTable
WHERE ID = 2

-- Insert New data
INSERT INTO Test_TemporalTable (FName) VALUES ('Data_03')

-- Now History table has historical data whereas temporal table has current data.

SELECT *
FROM Test_TemporalTable

SELECT *
FROM [dbo].[Test_TemporalTable_History]

```

Now, we can see the data in history table. The time-stamps are signifying the data validation period.

### How to query from temporal table:

We can see history data along with current data using below query.

```SELECT *
FROM dbo.Test_TemporalTable
FOR SYSTEM_TIME BETWEEN '2017-01-01 00:00:00' AND '9999-12-31 23:59:59.9999999'
ORDER BY ID,  FromDateTime

```

Here, a new clause “FOR  SYSTEM_TIME” has introduced in SQL 2016 to filter history time-stamp. You can apply normal WHERE clause after it.

### How to drop temporal table:

To drop temporal table we need to OFF system versioning first then it will allow to drop the temporal table. See below code:

```ALTER TABLE Test_TemporalTable SET (SYSTEM_VERSIONING = OFF)
GO
DROP TABLE dbo.Test_TemporalTable
```

# DBA with PowerShell : How to get service status using PowerShell in SQL Server query

Hello friends,

This is second post of “DBA with PowerShell” series. In this series we try to fetch some useful information of SQL Instance environment

Last post “DBA with PowerShell : Get perfmon counters using PowerShell” was about fetching critical information like CPU, Memory, IO utilization and insert it  into SQL Database so we can generate meaningful reports. In this post, we will see how to fetch list of services and its status .It is very helpful in case when we need to monitor any specific service status across multiple servers.

You need to pass the machine name or just simply put “.” for local machine’s services status.

```
DECLARE @MachineName VARCHAR(128) = '.'  --# Put Machine Name e.g. Mac01 OR put &quot;.&quot; for local machine
, @QueryStr VARCHAR(4000)

DECLARE @Service AS TABLE (QutPut VARCHAR(MAX))

-- Build Query to get service info
SET @QueryStr = 'PowerShell.exe -c &quot;\$a = 1; Get-WmiObject -ComputerName '+@MachineName+'  Win32_Service | Where-Object {\$_.Name -like ''*''} | SELECT SystemName, (@{Name=''Service Name''; Exp=''Name''}), (@{Name=''LogOn As''; Exp=''StartName''}), State| ForEach-Object {New-Object psObject -Property @{''SrNo''=\$a;''SystemName''= [String]\$a + ''^''+ \$_.SystemName; ''Service Name'' = [String]\$a + ''^''+\$_.''Service Name''; ''LogOn As'' = [String]\$a + ''^''+ \$_.''LogOn As''; ''State'' = [String]\$a + ''^''+ \$_.State};\$a ++}| Sort-Object SrNo |Format-List -Property  SystemName, ''Service Name'', ''LogOn As'', State &quot;'

-- Insert data into a table variable
INSERT INTO @Service
EXEC xp_cmdshell @QueryStr

-- Deleteing unwanted data
DELETE @Service WHERE QutPut IS NULL

-- Converting into presentable format
SELECT [Tag],[SystemName], [Service Name] ,[LogOn As],[State]
FROM (
, LTRIM(RTRIM(SUBSTRING(QutPut,CHARINDEX('^',QutPut)+1,len(QutPut)))) Val
, SUBSTRING(LTRIM(RTRIM(SUBSTRING(QutPut,CHARINDEX(':',QutPut)+1,len(QutPut)))),1,
CHARINDEX('^',LTRIM(RTRIM(SUBSTRING(QutPut,CHARINDEX(':',QutPut)+1,len(QutPut)))))-1) Tag
FROM @Service
)Pvt
PIVOT
(
MAX(Val) FOR Head IN ([SystemName], [Service Name] ,[LogOn As],[State])
) Dt

```

Cheers!

# R-Services in SQL SERVER 2016

Hello Friends, SQL Server always comes with exciting features in every new version upgrades. Finally; I have been fortunate enough to start working in SQL SERVER 2016.

In this blog, We will see about integration of “R” language in SQL SERVER.I have gone through multiple MSDN chapters, white papers and learning videos about R-Services, R-language, statistical mathematics including regression analysis and found that its really awesome to have “R” with SQL SERVER.

As we all know, now a day’s data analytics is very exciting stuff to work on and “R” is a very powerful tool to do statistical analysis. The integration of “R” with SQL SERVER is a very powerful combo in terms of data processing and statistical analysis on heavy data.

I would like to break this topic in following points:

1. Introduction of R language.
2. What is Statistical analysis?
3. How to call “R” scripts in SQL Server 2016.
• SP_EXECUTE_EXTERNAL_SCRIPT
• Different flavors of “R” in SQL Server 2016

R language : “R” is an open source, statistical programming software, which is used for statistical analysis. It can be download from software repository CRAN. “R” is widely used by data scientists for data analysis and also have very wide range of user group. “R” version used in SQL SERVER 2016 is R-Open and called R-Services in SQL SERVER.

Statistical Analysis: Statistical analysis is a mathematical way for data sampling & data analysis. Using it we try to build relationship between dependent and independent variables and find a valid pattern so we can predict the future consequences.

Dependent variable: For which we are concerned about OR we wants to predict.
Independent variable: On which dependent variable depends.

for e.g.  Consider a simple data-set of sales and purchase as follows

\$10,000   | \$5000                    | \$3000
\$20,000   | \$1000                    | \$4000

Here; “Revenue” depends upon “Production_Cost” & “Advertisement_Cost”. So “Revenue” is dependent variable and “Production_Cost” & “Advertisement_Cost” are independent variable. In other words we can say that we can control the independent variable but we can’t control dependent variable; that’s why we wants to predict it…Right!!!

Launchpad Service: Its a new service introduced in SQL Server 2016 to launch external scripts. As of now its “R” and most probably “python” in SQL 2017. You can see a new service in SQL Configuration Manager called “SQL Server Launchpad (<Instance Name>)

It has introduced to reduce main SQL engine work load while executing external scripts from SSMS. Some external commands can be executed using xp_cmdshell like powershell scripts and shares the resource of SQL SERVER. So Microsoft has introduced new Launchpad Services to execute such external scripts.

SP_EXECUTE_EXTERNAL_SCRIPT: Sp_Execute_External_Script system stored procedure helps to execute external scripts (currently, R-Services scripts) from SSMS. It can be enabled using “sp_configure ‘external scripts enabled’, 1;”. A sample code is as follows:

```
EXEC   sp_execute_external_script @language = N'R'
, @script = N'<R Script>'
, @input_data_1 = N'<SELECT Query from database>'
, @output_data_1_name = N'output dataset name'
WITH RESULT SETS ((<OUTPUT dataset''s column name and its datatype>));
```

Flavors of “R-Services” in SQL SERVER 2016: Its has 2 server components and 1 client component as follows

1. R Services (In-Database) : Specially designed for SQL SERVER data-sources.
2. Microsoft R Server (Standalone) : It supports multiple data-sources along with SQL SERVER.
3. Microsoft R Client: Its client environment to build code.

Note: You can install “R Services (In-Database)” and “Microsoft R Server (Standalone)”  in same instance and it will work as well but Microsoft recommends, not to install both features in a single instance.

I have done some practical with R-Services (In-database)  and its really amazing; will share in my future posts.

Hope! you like this introduction of R-Services in SQL 2016.

Thank you!!!