# Machine Learning with SQL Server & R : Part 02 – The Basic Math

Hello Friends!!!

We have discussed about introduction of Machine Learning in previous post  “What, Why and How”  now we will discuss about some basic level mathematics used in machine learning. It’s very important to understand the underlying mathematics for meaningful conclusion.

Following topics used by ML either directly or indirectly. its very basic but plays very deep role in Machine Learning.

• Probability.
• Mean or Average.
• Deviation & Variance.
• Standard Deviation.
• Regression Analysis.

Let’s have some brief introduction of these.

### Probability:

Chance or likelihood of any event is called probability. It lies between 0.0 to 1.0 where 1.0 is highly likely. It can be calculated using below formula.

Suppose, we have a sack contains 12 balls ( 5 Red, 4 Black and 3 Green) and we try to calculate probability of following 4 scenarios

1. What is the probability to get black from the sack?
2. What is the probability to get Black AND Red. Means the ball should have both the colors?
3. What is the probability to get Black OR Red. Means the ball could be either of these?
4. What is the probability not to get Black?

### Mean or Average:

It is the middle value of the data points. It gives a very general idea about the data and can be calculated using below formula

### Deviation & Variance (σ2 or s2):

“Deviation” is the distance of each data point to its mean and Variance is squired average of deviation. It tells about how data points are scattered around its mean but the clear picture will given by Standard Deviation.

### Standard Deviation (σ or s) :

“Standard Deviation” is square root of the Variance. A smaller Standard Deviation means data are accumulated around its mean and bigger Standard Deviation denotes more scattered data points.

there are various ways to denote “Standard Deviation” but mostly we use small sigma (“σ”) or “S” whereas “σ” denotes “Standard Deviation of population” and “S” stands for ” standard deviation of sample

### Regression Analysis:

“Regression Analysis” is a statistical method to establish relationship between the variables. Suppose you are studying about rain prediction model and you have some data like “wind speed” and “Temperature” etc. then using regression analysis you can establish relationship between the event (Rain will happen or not) and its independent variables.

There are various kind of regression analysis like “Linear Regression”, “Polynomial Regression”, “Logistic Regression” and many more according to what kind of study you are doing.

Below is an example of simple linear regression using least square method to fit a regression or prediction line. Here we are trying to predict the value of y on the behavior of value x.

We saw the simplest example and it has many assumption in it but basically it work like this. likewise, we can fit the model using R2 method where we try to reduce the square of residuals (the difference between fit value and actual value).

In following post we will see how to implement “Simple Regression”, “Multiple Regression”, “Logistic Regression” and more using “lm” and “glm” function in RTools.

Related posts:

Machine Learning with SQL Server & R : Part 01 – What, Why and How

Thanks!!!

# Machine Learning with SQL Server & R : Part 01 – What, Why and How

Hello Friends!!!

Machine Learning (ML) is very fascinating nowadays. But what is ML? why it is required? how a machine can learn? what kind of output it will provide? these kind of questions comes in our mind when we think about Machine Learning, right!!!

In this blog post we are going to discuss very briefly about Machine Learning and in later posts we will see about its application and coding. This series will help those individuals who has just started OR planning to start understanding Machine Learning Technology and it will be a good refresher for those who are already working in it.

### What: What is Machine Learning?

Everyone of us are good at something like riding a bike, playing some game, cooking etc.; correct!

Now, hold-on and think, why you are good at those things? Because you know it very well. You know what could happen and what kind of reaction you supposed to do for any consequences.

Imagine, a kid learning to walk. he falls initially but as time passes, he walks very well. And the reason is very obvious that he learnt how to walk OR in other words he learnt how not to fall. This is called “Learning” from your past experience (or “Data” in computer science) and match with it’s current output and keep doing by itself. Now, just replace the kid with a small robot and somehow program it to learn from it’s past data and apply to the next occurrence, it will react same as the kid. THAT IS MACHINE LEARNING.

Machine Learning is a technique which enables a program to learn from available data and it’s future occurrences and act according to that. AI (Artificial Intelligence), ML (Machine Learning), NLP (Natural Language Processing), (NN) Neural Network and Deep Learning; are branches of Data Science with marginal differences in terms of its input & output complexity.

Example: When your mail service provider separates junk mails for you, when you see sales advertisement on your screen as per your last search and many more are very good example of machine learning algorithms.

Machine Learning broadly divided into two categories like supervised learning and unsupervised learning.

Supervised Learning: In supervised learning the output is desirable like whether any occurrence will happen or not OR what is the likelihood of any occurrence. Classification problems  and  regression problems are very good example of supervised learning.

Unsupervised Learning: In unsupervised learning the output is not desirable. Clustering problems and association problems are good example of unsupervised learning.

### Why: Why ML is trending now?

Now, you  might be having the idea that ML is a “Data” hunger program. if no data then no machine learning ( except it is not comes under unsupervised learning). As we have a huge data assets with us in form of structured data ( form transaction systems, warehouses etc.) or unstructured data ( from social media, online shopping platform etc.) so now business houses, politicians, banks  and others wants see their data and convert into some meaningful output for better decision making. It becomes very favorable condition to use ML or AI services to make better decisions.

### How: How we can use ML as a tool?

There are many tools available to explore data using Machine Learning like R, Python, Matlab etc. but we are going to look “Machine Learning Services” of SQL SERVER 2017 (R). By using SQL Server 2016 and above machine learning services we can process the entire data set in the model. We are going to use RTool in Visual Studio 2017. You can refer R-Services in SQL SERVER 2016 post for its introduction part.

# Temporal table in SQL Server 2016

Hello Friends,

System-versioned temporal table OR temporal table is a new type of user table with data history. It maintains historical data like SCD-2 (Slowly Changing Dimension type2) .

This concept has introduced in ANSI SQL 2011 and incorporated by MS SQL Server in SQL Server 2016 onward. It’s a very great feature for auditing purpose.

### Facts of Temporal Table:

Below mentioned features are tested and verified.

1. It has two parts system-versioned temporal table and History table.
2. Temporal table maintains only data level history.
3. PRIMARY KEY is compulsory while defining system versioning.
4. History table is read only.
5. We can create Indexes/statistics in history table for better query performance.
6. We cannot add column, alter datatype, add trigger, add constraint, add check constraint etc. in history table. More precisely, anything which requires schema changes or data restrictions are not allowed.
7. It requires 2 additional columns with datetime2 datatype to store time-stamps (UTC date-time). Also, it doesn’t affect any DML operation as it is hidden for such statements.
8. If user defined history table name already exists then it will become history table with old data.
9. If we add column in existing temporal table then it will maintain history for it without any additional setting.
10. We can drop column from temporal table even if it has SYSTEM_VERSIONING = ON and it loses the history for that column as well.

### How to create Temporal Table:

Using below CREATE TABLE statement, we can create temporal table.

```CREATE TABLE dbo.Test_TemporalTable
(
ID INT IDENTITY PRIMARY KEY
,FName VARCHAR(50)
,FromDateTime DATETIME2 GENERATED ALWAYS AS ROW START
,EndDateTime DATETIME2 GENERATED ALWAYS AS ROW END
,PERIOD FOR SYSTEM_TIME (FromDateTime,EndDateTime)
)
WITH (SYSTEM_VERSIONING = ON)
```

FromDateTime: Data history start date time.

EndDateTime : Data history end datetime.

SYSTEM_VERSIONG = ON: signifies that system versioning has enabled.

You can see on Object Explorer; “system-versioned” tag with temporal table. Which defines it as temporal table.

### Where is the history table?

You can find the history table just beneath the temporal table in object explore with “History” tag. Also it has similar structure as temporal table but without primary key. Above CREATE TABLE statement has no user -defined name for history table so it will be create with prefix MSSQL_ and suffix object_id. In case, if the auto generated history table already exists then SQL Server will add a random number after it.

See below snap:

### Insert/Update/Delete on system-versioned temporal table:

We can perform Insert/Update/Delete as like normal table. The additional columns are hidden for such operations.

In below SQL, we will create a temporal table with user defined History table and Insert/update/delete data on it.

```-- Creating system-versioned table
GO
CREATE TABLE dbo.Test_TemporalTable
(
ID INT IDENTITY PRIMARY KEY
,FName VARCHAR(50)
,FromDateTime DATETIME2 GENERATED ALWAYS AS ROW START
,EndDateTime DATETIME2 GENERATED ALWAYS AS ROW END
,PERIOD FOR SYSTEM_TIME (FromDateTime,EndDateTime)
)

-- Enable system versioning
ALTER TABLE Test_TemporalTable SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.Test_TemporalTable_History))

-- Inserting test data into temporal table

INSERT INTO Test_TemporalTable (FName) VALUES ('Data_01'),('Data_02')

-- Currently no data into History table
SELECT *
FROM Test_TemporalTable

SELECT *
FROM [dbo].[Test_TemporalTable_History]

```

After inserting data into temporal table; we can find data only in temporal table whereas History table has no data in it because it has no change till now.

Now, let’s perform some DML operation in temporal table

```-- DML operation
-- Update ID=1
UPDATE Test_TemporalTable
SET FName = 'Data_01_Updated'
WHERE ID = 1

-- Delete ID=2
DELETE Test_TemporalTable
WHERE ID = 2

-- Insert New data
INSERT INTO Test_TemporalTable (FName) VALUES ('Data_03')

-- Now History table has historical data whereas temporal table has current data.

SELECT *
FROM Test_TemporalTable

SELECT *
FROM [dbo].[Test_TemporalTable_History]

```

Now, we can see the data in history table. The time-stamps are signifying the data validation period.

### How to query from temporal table:

We can see history data along with current data using below query.

```SELECT *
FROM dbo.Test_TemporalTable
FOR SYSTEM_TIME BETWEEN '2017-01-01 00:00:00' AND '9999-12-31 23:59:59.9999999'
ORDER BY ID,  FromDateTime

```

Here, a new clause “FOR  SYSTEM_TIME” has introduced in SQL 2016 to filter history time-stamp. You can apply normal WHERE clause after it.

### How to drop temporal table:

To drop temporal table we need to OFF system versioning first then it will allow to drop the temporal table. See below code:

```ALTER TABLE Test_TemporalTable SET (SYSTEM_VERSIONING = OFF)
GO
DROP TABLE dbo.Test_TemporalTable
```

Read in More Detail…

# DBA with PowerShell : How to get service status using PowerShell in SQL Server query

Hello friends,

This is second post of “DBA with PowerShell” series. In this series we try to fetch some useful information of SQL Instance environment

Last post “DBA with PowerShell : Get perfmon counters using PowerShell” was about fetching critical information like CPU, Memory, IO utilization and insert it  into SQL Database so we can generate meaningful reports. In this post, we will see how to fetch list of services and its status .It is very helpful in case when we need to monitor any specific service status across multiple servers.

You need to pass the machine name or just simply put “.” for local machine’s services status.

```
DECLARE @MachineName VARCHAR(128) = '.'  --# Put Machine Name e.g. Mac01 OR put &quot;.&quot; for local machine
, @QueryStr VARCHAR(4000)

DECLARE @Service AS TABLE (QutPut VARCHAR(MAX))

-- Build Query to get service info
SET @QueryStr = 'PowerShell.exe -c &quot;\$a = 1; Get-WmiObject -ComputerName '+@MachineName+'  Win32_Service | Where-Object {\$_.Name -like ''*''} | SELECT SystemName, (@{Name=''Service Name''; Exp=''Name''}), (@{Name=''LogOn As''; Exp=''StartName''}), State| ForEach-Object {New-Object psObject -Property @{''SrNo''=\$a;''SystemName''= [String]\$a + ''^''+ \$_.SystemName; ''Service Name'' = [String]\$a + ''^''+\$_.''Service Name''; ''LogOn As'' = [String]\$a + ''^''+ \$_.''LogOn As''; ''State'' = [String]\$a + ''^''+ \$_.State};\$a ++}| Sort-Object SrNo |Format-List -Property  SystemName, ''Service Name'', ''LogOn As'', State &quot;'

-- Insert data into a table variable
INSERT INTO @Service
EXEC xp_cmdshell @QueryStr

-- Deleteing unwanted data
DELETE @Service WHERE QutPut IS NULL

-- Converting into presentable format
SELECT [Tag],[SystemName], [Service Name] ,[LogOn As],[State]
FROM (
SELECT LTRIM(RTRIM(SUBSTRING(QutPut,1,CHARINDEX(':',QutPut)-1))) Head
, LTRIM(RTRIM(SUBSTRING(QutPut,CHARINDEX('^',QutPut)+1,len(QutPut)))) Val
, SUBSTRING(LTRIM(RTRIM(SUBSTRING(QutPut,CHARINDEX(':',QutPut)+1,len(QutPut)))),1,
CHARINDEX('^',LTRIM(RTRIM(SUBSTRING(QutPut,CHARINDEX(':',QutPut)+1,len(QutPut)))))-1) Tag
FROM @Service
)Pvt
PIVOT
(
MAX(Val) FOR Head IN ([SystemName], [Service Name] ,[LogOn As],[State])
) Dt

```

Cheers!

# R-Services in SQL SERVER 2016

Hello Friends, SQL Server always comes with exciting features in every new version upgrades. Finally; I have been fortunate enough to start working in SQL SERVER 2016.

In this blog, We will see about integration of “R” language in SQL SERVER.I have gone through multiple MSDN chapters, white papers and learning videos about R-Services, R-language, statistical mathematics including regression analysis and found that its really awesome to have “R” with SQL SERVER.

As we all know, now a day’s data analytics is very exciting stuff to work on and “R” is a very powerful tool to do statistical analysis. The integration of “R” with SQL SERVER is a very powerful combo in terms of data processing and statistical analysis on heavy data.

I would like to break this topic in following points:

1. Introduction of R language.
2. What is Statistical analysis?
3. How to call “R” scripts in SQL Server 2016.
• Launchpad Service.
• SP_EXECUTE_EXTERNAL_SCRIPT
• Different flavors of “R” in SQL Server 2016

R language : “R” is an open source, statistical programming software, which is used for statistical analysis. It can be download from software repository CRAN. “R” is widely used by data scientists for data analysis and also have very wide range of user group. “R” version used in SQL SERVER 2016 is R-Open and called R-Services in SQL SERVER.

Statistical Analysis: Statistical analysis is a mathematical way for data sampling & data analysis. Using it we try to build relationship between dependent and independent variables and find a valid pattern so we can predict the future consequences.

Dependent variable: For which we are concerned about OR we wants to predict.
Independent variable: On which dependent variable depends.

for e.g.  Consider a simple data-set of sales and purchase as follows

Revenue   | Production_Cost | Advertisement_Cost
\$10,000   | \$5000                    | \$3000
\$20,000   | \$1000                    | \$4000

Here; “Revenue” depends upon “Production_Cost” & “Advertisement_Cost”. So “Revenue” is dependent variable and “Production_Cost” & “Advertisement_Cost” are independent variable. In other words we can say that we can control the independent variable but we can’t control dependent variable; that’s why we wants to predict it…Right!!!

Launchpad Service: Its a new service introduced in SQL Server 2016 to launch external scripts. As of now its “R” and most probably “python” in SQL 2017. You can see a new service in SQL Configuration Manager called “SQL Server Launchpad (<Instance Name>)

It has introduced to reduce main SQL engine work load while executing external scripts from SSMS. Some external commands can be executed using xp_cmdshell like powershell scripts and shares the resource of SQL SERVER. So Microsoft has introduced new Launchpad Services to execute such external scripts.

SP_EXECUTE_EXTERNAL_SCRIPT: Sp_Execute_External_Script system stored procedure helps to execute external scripts (currently, R-Services scripts) from SSMS. It can be enabled using “sp_configure ‘external scripts enabled’, 1;”. A sample code is as follows:

```
EXEC   sp_execute_external_script @language = N'R'
, @script = N'<R Script>'
, @input_data_1 = N'<SELECT Query from database>'
, @output_data_1_name = N'output dataset name'
WITH RESULT SETS ((<OUTPUT dataset''s column name and its datatype>));
```

Flavors of “R-Services” in SQL SERVER 2016: Its has 2 server components and 1 client component as follows

1. R Services (In-Database) : Specially designed for SQL SERVER data-sources.
2. Microsoft R Server (Standalone) : It supports multiple data-sources along with SQL SERVER.
3. Microsoft R Client: Its client environment to build code.

Note: You can install “R Services (In-Database)” and “Microsoft R Server (Standalone)”  in same instance and it will work as well but Microsoft recommends, not to install both features in a single instance.

I have done some practical with R-Services (In-database)  and its really amazing; will share in my future posts.

Hope! you like this introduction of R-Services in SQL 2016.

Thank you!!!

# List down folders underlying file info using SQL query

Sometimes we need to list down file/folder’s information from a folder like filename, extension, created date and modified date etc.

The below query can give you the list of all content’s info from given folder path.

SQL Query:

```DECLARE @FolderPath VARCHAR(MAX) = ''

IF OBJECT_ID('Tempdb..#FileDetails') IS NOT NULL
DROP TABLE #FileDetails

CREATE TABLE #FileDetails (FileDetails VARCHAR(MAX))

INSERT INTO #FileDetails (FileDetails)
EXEC ('EXEC XP_CMDSHELL ''powershell.exe -c "Get-ChildItem '''''+@FolderPath+''''' | SELECT Name,CreationTime,LastWriteTime,Extension|foreach{\$_.Name+''''@''''+\$_.CreationTime+''''|''''+\$_.LastWriteTime+''''#''''+\$_.Extension}"''')

SELECT SUBSTRING(FileDetails,0,CHARINDEX('@',FileDetails)) AS [File Name]
, CONVERT(DATETIME,SUBSTRING(FileDetails,CHARINDEX('@',FileDetails)+1,19)) AS [Creation Date Time]
, CONVERT(DATETIME,SUBSTRING(FileDetails,CHARINDEX('|',FileDetails)+1,19)) AS [Modified Date Time]
, ISNULL(NULLIF(SUBSTRING(FileDetails,CHARINDEX('#',FileDetails)+1,LEN(FileDetails)),''),'Directory') AS [Extension]
FROM #FileDetails
WHERE FileDetails IS NOT NULL
```

Output Content:

 Column Name Description Remarks File Name Name of content File name, Folder name etc. Creation Date Time File/Folder creation date-time Modified Date Time Last Modified date-time Extension File extension like.. .sql, .xlxs etc.

#filedetails

# How to recover Tempdb system database step by step

Hello Friends,

We all know Tempdb is very important component of SQL SERVER Instance and there are some incidents where things goes crazy due to Tempdb.

I have some crazy experiences with Tempdb like giving wrong path OR forget to specify extension while shifting Tempdb database or Somehow Tempdb got crashed so the SQL Instance couldn’t able to work and will give  error like “Path is not found for Tempdb database” or “Specified path is not found”.

It’s bit difficult to recover Tempdb back on track in this situation. But using command prompt we can do that.

You can follow the below steps to recover Tempdb database if SQL Instance is not able to bring up because not able to create Tempdb.

Step 01: STOP ALL Services of sql instance.
Step 02: Open cmd and execute below command:
For Default SQL Instance:
`       NET START MSSQLSERVER /f /T3608 -- to bypass system db recovery. `
For Named Instance:
`       NET START MSSQL\$/f/T3608 `

Step 03: Open SQLCMD using below command in cmd prompt:
```SQLCMD -S GO```

Step 04: Execute below code to change tempdb path in SQLCMD.
``` ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev' , FILENAME = '.mdf' ) GO ALTER DATABASE tempdb MODIFY FILE (NAME = 'templog' , FILENAME = '.ldf' ) GO```

Step 05: Exit from SQLCMD USING EXIT.

Step 06: Stop SQL Service:
``` NET STOP MSSQLSERVER /f /T3608 OR NET STOP MSSQL\$```

Step 07: Start services from config manager.

Then Enjoy!!!

Hope this post help someone….