PageViews Last Month

Saturday, 23 November 2019

SQL , Lift up your heads for the cloud - Azure SQL

In my journey with Azure SQL Server I am often asked.

1) Why should I move to cloud ?
2) Which model should I go for ?
3) What do I need to do if my SQL server is on cloud ?

While all the answers are out there in many cloud forums and websites people often tend to forget the basics. They are What,When,Where, How and obviously the big Y.So lets address them one by one.

Why Should I move to Cloud?

Was any Y ever answered in a one liner ?
Weren't they always in the answer in brief section ? 😏

What : Cloud in literal terms is a layer, a carrier of your services.
It can provide you Hardware to access if you like (Infrastructure as a service/ IAAS),
It can provide you a mix of both Hardware and Software to some extent (Platform as a service/PAAS)
or It can provide only the Software for you to work with (Software as a Service/SAAS)

When : You can choose to move to cloud after having discussed with your stakeholders and decision makers keeping in mind your end customers. It always helps to do your homework before the TESTS and not during the TESTS (Yes your DEV, UAT, SIT,Pre-Prod and Prod servers/apps) . And by that I mean familiarizing yourself with the pros and cons of the technology you will choose or propose to your clients.

Where : As a consumer of Azure services myself I would say Microsoft Azure. But you all must be  aware, that cloud is everywhere. So you can choose from lot of vendors once you decide the model. Your vendor SLA's and prices essentially governs your choice.

How : There are a variety of tools you can choose from, references to which are at the end of this section. I want to ensure I help you in your homework.

Why : The key lies in the amount of control you would like to retain for the hardware, software, applications and most importantly the peace of mind.Not just for your clients but your bosses,co-workers, employees and you. Here are a few points which highlight the why's 
  • One should move to cloud if one can see the cloud and you are visionary (That's my personal thought I know some may think and see differently 😉 No awards or perks for anyone who wishes to debate) 
  • One should choose cloud if their hardware and software needs monitoring and control 24*7. 
  • One should move to cloud as you never know, when calamity strikes. 
  • And if and when it does you cannot be expected to think of your office and clients instead of your own safety and your loved ones well-being.
  • One should move to cloud if you intend to be happy, along with making your clients happy and your workers happy.
I'm hoping after this you will be advancing to cloud. Let me help you in understanding the next steps

Which model should I go for?

The extent of management and administration will vary based on the model you choose (IAAS/PAAS/SAAS). This will also matter in the amount of labor hours and cost an organization saves in genera. The saved resources can be utilized to focus on upgrading and up scaling to something bigger, better and faster efficiently.

You may have come across the below diagram many a times
To make simpler than it already is here is how you choose

IAAS -> Infrastructure as a Service
1) Do you want to be in complete control of your entire machine (CPU, Memory, RAM, Disks, I/O)
2)Do you want to be in complete charge of what goes in and out of the machine (CPU, Memory, RAM, Disks, I/O) ?
3) Are you not OK sharing any Software or controllable Hardware?
If the answer is yes. Your model is IAAS. 
  • Clearly since only you can go in and out of the machine only you can install or uninstall products thus managing the applications deployed on the machine. 
  • The provider is responsible to ensure at all times, you have the Machine up and running.
  • You will not loose your Configuration of the Machine

PAAS -> Platform as a Service
1) Do you want to be in control of the platform but not pay for the whole machine (CPU,Memory,RAM,Disks,I/O) ?
2) Do you want someone to ensure your platform is up and running all the time. Irrespective of your machine,where precisely and how they get deployed?
3) Are you OK with sharing the machines as long as your applications, its design and data are accessible to only to you ?
If the answer is yes. Your model is PAAS. 
  • Clearly you are on a shared hardware you may not be be able to set the CPU,Memory,RAM,I/O directly. 
  • But through a careful choice of options you can control the amount of the hardware you can select from. 
  • The provider is responsible to ensure your Platform is up and running. Even if the underlying Machine breaks down. 
  • You will not loose your Design and Data for the applications

SAAS -> Software as a Service
1) Do you want to be in control of the service but not pay for the whole machine or the products installed on it (CPU,Memory,RAM,I/O + Applications)
2) Are you OK sharing but you want to ensure the services are up all the time ?
If the answer is yes. Your model is SAAS. 
  • Clearly you are on a shared hardware you may not be be able to set the CPU,Memory,RAM,I/O directly. 
  • Neither can you choose the way the platform is designed or the way data is stored.
  • The provider is responsible to ensure your Services are running at all times. Even if the underlying Machine breaks down or the Platform changes.
  • You will not loose your access to the Service. 

What do I need to do if my SQL server is on cloud?


You need to do precisely what you are not paying for 
And be in control of what your cloud provider does for you.

IAAS: If you paying only for the Machine, you manage the platform & applications on it
(Provider will also have access to ensure continuity of your access to the machine)
PAAS: If you are paying for the Platform, you manage the design of your application, data and who access it. 
(Provider will also have access to ensure continuity of your access to the platform)
SAAS: If you are paying for the Services, you manage on the end users who access/avail it
(Provider will also have access to ensure continuity of your access to the services)

Enough talking lets get doing :)So here are the links for how to and a further reads

When referencing the below links ensure you are checking for the latest update dates. You cannot expect things not to change in the Azure world. Ensure you check when the page was last modified before you implement the steps and procedures. Never forget to thank the ones you find helpful

What goes around comes around. Lets spread a little appreciation shall we? Feel free to provide feedback's, there are people listening.

IAAS / PAAS SQL Server

Documents
Videos

IAAS SQL Server

Documents
Videos
https://channel9.msdn.com/Series/SQL-Server-in-Windows-Azure-Virtual-Machines


PAAS SQL Server

Back for More with SQL, Azure, Machine Learning, Big Data!!!!

Let me start with an apology for being missing so long :) Guess was trying to find where technology takes me as I started my journey with the pool of products that Microsoft had to offer me!!! Every now and then however I would refer to my own blogs because the RAM of a human brain has limited capacity, it can only store this much....

I bet you all would agree ,with the technology ever so evolving I need to bring my feet back to the ground where it all began.
Learning, sharing, teaching, updating and repeating. 
Thanks Surabhi for being my pillar and ensuring I never stop. 
With every year you kept me going and still keep me going..
So what's the update I bring you for this blog ? SQL and more with Kruti will have upcoming series based on the following 
1) More about SQL , lift up your heads for cloud
2) Bringing out magic from data in cloud.
3) The science of making machines learn ?

I'm super excited as I start again. Hope you are too !!!!

Sunday, 10 April 2016

Unix Basic Commands

In similar context to my previous blog I'm providing a list of UNIX basic commands which you may find handy if your new to the UNIX world.

UNIX Help Command
  • man <command>
  • man man –gives details of the man command
UNIX File Operations Commands

Piping operators for I/O redirection
  • >file -Make file the standard output
  • <file -Make file the standard input
  • >>file -Make file standard output & append its contents
  • <<word -Take shell input upto the first line containing the word or EOF
  • Cmd1 | cmd2 –Take the output of cmd1 & supply it as input for cmd2.
Viewing a file 
  • Find /dir1 /dir2 –name A* -print – Finds all the files beginning with A in the directories dir1 & dir2 & prints them
  • Cat <file>
  • Touch <filename> -Create an empty file
  • Cat file1 file2> file3 - File 3 if existing will get overwritten with contents of file 1 & then file 2.File 3 if non-existent will be created with the above contents.
  • >>file3- Contents of file3 will be appended instead of overwritten.
Copying a file
  • Cp source_file1 destination_file2 - Copies the source file1 to destination file2
  • Cp file1 file2 file 3 /etc/bin/dump - Copies the list of files (1,2,3) in current directory to the folder /etc/bin/dump
  • Cp /etc/tmp/file1 /etc/dump/file1
Removing a file (In windows context-deleting a file)
  • Rm file1 
  • Rm –I file1 - You are asked confirmation to delete the file.
  • Rm –r dir1 - Recursive directory content deletion
Moving a file (In windows context-renaming a file)
  • Mv file_old file_new – Renames file_old with the new name file_new
  • Mv dir_oldname dir_newname –Renames directory in similar fashion
Listing files
  • Ls – Lists files in the alphabetical order
  • Ls –l - Long listing of files
  • Ls –A –Show hidden files as well
Permissions Weight

Permissions  Weight
Read (r)           4
Write (w)        2
Execute(x)       1

Assigning permissions to a file
  • Chmod 741 myfile – Grants owner (4+2+1) all permissions on my file. Grants group of the owner (4) read permission on the file.Grants rest of the users (1) execute permission on the file 
  • Chmod go+rw u=r file1- Add the read & write priveledge to owner & group users. Grants only read permission to the rest of the users to file1 
  • syntax(chmod [o,g,u] [+/-/=] [permissions] file
-Read permission required for a directory to view the files it contains.
-Execute permission required for a target directory to change the current directory to the target directory.

Commands for directories
  • Mkdir dir_name – Creates the directory in the current working directory
  • Pwd - Returns the current working directory
  • Rmdir dir_name- Removes the directory provided its empty

UNIX File Manipulation Commands

Metacharacters used for manipulation operations
  •  ?- symbol signifies a single character
  • *- Inpresence or absence of any number of character which exists at the place of *.
  • [abcd]*- matches records starting with the range (in this case a,b,c,d)
  • ![abcd]*-matches records not starting with the above range.
Commands for viewing contents of the files
  • Wc -lwc file1 -Counts the number of lines, words & characters in file1
  • nl file1 -Add line numbers to the file1
  • Cat file1 -Concatenates & displays files 
  • Head -15 file1 -Displays the first 15 lines of file1
  • Tail -15 file1 -Displays the last 15 lines of file1
  • Uniq file1 -Reports repeated lines of file1
Viewing modified contents
  • Sort file1 file2 file3 -Sorts contents in all the 3 files alphabetically and displays output in terminal.
  • Sort –m –oresults file1 file2 -Merges output for already sorted files (file1,file2) and sends output to file results.
  • Sort –r +1 -3 file3 - Reverse sort starting at the 2nd field (+1), ending at 3rd field(-3) for File3.
  • Cut –f 2,3 –d “:” file1 -Cut picks the 2nd & 3rd field for the file1 delimited by the : (colon) symbol & displays output at the terminal.
  • GREP (globally search an expression & print it)
  • Grep ‘search this pattern’ –i-n file1 file2. Searches for the pattern in quotes for file1 & file2.   -n will include the entire line to be displayed. –i makes it case sensitive.
  • Who |tee logfile| sort - Tee is used in place where the output is to be redirected to a file as well as supplied as an input to the next command. Here tee command will send the output of ‘who’ command to logfile as well as supply the same output to the ‘sort’command & display the sorted results in terminal.

UNIX Terminal Users commands

  • Logname –Prints the logname of the user
  • Uname –Prints the name of the unix system
  • Uname –m - Prints the machine model ( eg: i686)
  • Uname-r -Prints the kernel version
  • Hostname -Returns hostname of the machine
  • Hostname <machinename> -Returns the ip of the machine.
  • /sbin /inconfig – as good as ipconfig for windows
  • Rsh <host> <command> -Execute command on host.
  • Ssh <host> -Start terminal session on host
  • telnet <host> -start terminal session on host
  • Sync -Forces the buffer cache & the page cache to write all the dirty pages to the disk.
  • Shutdown/shu <mode> <delay>- Does a graceful shutdown of the system, modes -r for reboot,-h for halting, delay accepts number in seconds to wait till shutdown.Else mention ‘now’ to send a warning message to all users about shutdown.
  • Dmesg -View boot messages
  • Cat /etc/passwd -Lists information for all the users on the system
  • Printenv /enc- prints all the environment variables

UNIX process related commands
  • Ps -a -Lists all the processes; from all the users (-a option) that are running at the current instance. 
  • Ps –f –t tty3 – Shows a full listing of all the processes that are running from terminal (tty3).
  • Ps –u user1 –Shows all the processes that are running under the user (user1).
  • Daemon processes are constantly running in the background.
  • Sort file1 > file2& - this command will be forced to run in the background due to the ‘&’ symbol at the end & will return the PID number when you hit enter.
· Note: use the PID to track the completion of the process. The output should always be redirected to a file.
  • Nohub sort file1> file2 –the command is executed in the background & guaranteed execution completion even if we logout.NOHUB stands for no hang ups
  • At <option> time – Runs the job specified by <option> at the time. Once you hit enter after the ‘at’ command you may enter the job definition. Once you hit Ctrl+d after giving the job details you will receive the job id number. Options –l is used for listing of the jobs & -r for removing of the jobs
  • At now+1 day <cmdfile - now the ‘at’ command will execute the commands in the cmdfile after a day.
  • /users/lib/cron/at.allow file contains the list of users who are permitted to execute ‘at’command. Similarly at.deny file contains list of users who are not permitted to execute the ‘at’ command
  • Batch <option> time –Runs the command when the load average drops below 8,optionally after the set Time.
  • Crontab cmdfile –executes the job script in cmdfile at the schedule mentioned in the cmdfile. Options –l is used for listing of the jobs & -r for removing of the jobs
Time format : Minute |Hour| DayofMonth|Month of the year|Day of the Week|Command
Eg : cat >cmdfile
30 9 1 * * echo “work hard”
Ctrl +d
Crontab will echo the message at 9.30AM (30 9 1 * *) on first of every month (30 9 1 * *)
‘*’ denotes all possible values. Sunday is represented as ‘0’ in the ‘Day of week’
  • /users/spool/cron/crontabs stores the lists of crons scheduled 
  • /users/lib/cron/cron.allow file contains the list of users who are permitted to execute ‘crontab’command. Similarly cron.deny file contains list of users who are not permitted to execute the ‘crontab’ command

UNIX important directories
  • /etc/shadow –Contains the encrypted passwords for the user. Only viewable by the root user.
  • /var/long - Location for most of the system log files
  • /etc/inittab – Location of system configuration files
  • /etc/brc -Executed at startup by init
  • /etc/hosts –List of hosts that the system is aware about.
  • /usr/lib/cron/log –Log of cron processing
  • /usr/spool/cron/crontabs –Event list for cron
  • /etc/mnttab -List of mounted file systems
  • /etc/rc -Startup shell script executed by init
  • /etc/profile –Custom shell script executed by init
  • /etc/sysctl.conf –Compiled in kernel parameters

UNIX file system related commands
  • mount <option> <dev> <mount_point> -Mounts a file system on device <dev> at <mount_point> with the <options> specified.
  • unmount <dev>/<mount_point> -Unmounts the file system or device.

UNIX system monitoring commands
  • lsmod -Show currently loaded modules
  • insmod -Load a module
  • rmmod –Unload a module
  • ipcs <option> - Option –m shows the current usage of shared memory, -s usage of semaphores
  • du -Reports directory-wise disk usage
  • df -Reports directory-wise disk free space
  • dfspace -Reports free disk-space in terms of MB.
  • sar <option> <interval> <count> - Reports the system activity, option –q shows the CPU 
  • queue, -u shows the CPU utilization, -d shows the device activity.
  • vmstats <interval> <count> -Shows the virtual memory stats
  • iostats <option><interval><count> -shows the i/o statistics, option –d for device activity. -c        
  • for CPU activity, -x for extended disk activity.

Friday, 8 April 2016

VI Editor Basic Commands

A few months back I came across an instance of using VI Editor. Trust me being a SQL DBA it was quite a task to remember the editing commands. So I made myself a list of commands that came in handy. Off course you may find amples of site providing the same info. Am sharing across my list hoping it may be helpful.

Basics 
  • VI stands for visual editor. 
  • Can handle text files. 
  • Its case-sensitive. Needs to be dealt with care as no error messages appear like while executing SQL queries or UNIX commands. 

VI editor modes of operation

-> Command Mode

In this mode all the keys pressed by the user are considered as executable commands. Once the Vi editor is invoked it enters into the command mode. To return to command mode from any of the below mode press ‘ESC’ key.


-> Insert Mode


This mode permits insertion of new text, editing of the existing text & replacing of the existing text for the file.
To enter in the insert mode by selecting any of the below keys.

I, i, A, a, O, o, R, r, C, c

-> Ex Command Mode

This mode permits the user to give commands at the command line (the bottom line of the vi editor screen).
The command line is used to display messages & commands.
All block commands are executed in this mode
Any commands proceeded with the : (colon) symbol are given in this mode.


Invoking vi editor
$vi file1

The vi editor shows the full screen view of the file. If the file isn’t long enough to fill the screen; vi editor shows tildes(~) on the blank lines beyond the EOF.


Moving the cursor
  • h-moves the cursor one character left
  • 2h-moves the cursor 2 characters left
  • l-moves the cursor one character right
  • 3l-moves the cursor 3 characters right
  • j-moves the cursor one character below
  • 4j- moves the cursor 4 characters below
  • k-moves the cursor one character above
  • 5k-moves the cursor 5 characters above
  • w- moves cursor one word forward
  • b- moves the cursor one word backward
  • e- end of the current word
  • 0- moves cursor at the beginning of the current line
  • $-moves cursor at the end of the current line
  • +-moves cursor below the beginning of the next line.
  • --moves the cursor above the beginning of the previous lines
  • H- go to the first line on the screen
  • M-go to the middle line on the screen
  • L- go to the last line on the screen
  • G-go to the last line of the file
Scrolling the screen 
  • Ctrl+f – Scroll forward one window
  • Ctrl+b – Scroll backward one window
Inserting text w.r.t cursor position
  •  i- Inserts text before the current position
  •  I-Inserts text at the beginning of the file
  •  a- Inserts text after the current position
  • A- Inserts text at the end of the file 
  • o -Inserts a blank line below the current position 
  • O-Inserts a blank line above the current position 
  • c –Change current object 
  • C-Change from current position till end of line. 
  • r –Replace character at current position 
  • R- Replace all characters until <ESC> is pressed. 
Deleting text w.r.t cursor position 

  • dw –Delete current word 
  • dd –Delete current line 
  • d0- Delete from current position to the beginning of the line 
  • d$ -Delete from current position to the end of line 
  • x- Deletes the character directly under the current position 
  • <n>x- Deletes n characters 
  • <n>dw –Deletes n words 
  • <n>dd- Deletes n lines 
  • J – Join the EOL character. Join the current & next line. 
  • <n>J –Join the next n lines 
Undo change w.r.t cursor position
  • u –Undo the effect of the last command 
  • U –Undo all changes to the current line since the cursor was moved to this line. 
  • ~ - Changes the character in the current position from upper to lower & vice-versa 
  • :sh –Temporarily returns to the shell to perform some shell commands. Type ‘exit’ to return to the vi editor. 
Searching patters w.r.t cursor position
  • /<string> -Search forward to the next occurrence of the string 
  • ?<string> -Search backward to the next occurrence of the string 
  • ^<string>- Search for all the lines which begin with the string 
  • <string>$-Search for all the lines which end with the string 
  • \<<string>-Search for all the words which begin with the string 
  • <string>\>-Search for all the words which end with the string
You may use metacharacters to represent the <string>

Quitting Vi editor
  •  :q! –Will terminate the file whether or not the changes made in the buffer were written
  •  :wq –Write all changes & quit editor.
  •  :w file1 –Write all changes to file1 & quit editor
  •  :q –Quits editor if the changes made were written to a file

Block commands in vi editor

First press the ‘Esc’ key to enter the command mode.
Then the ‘:’ key to begin with block commands.

To display line numbers enter the command ‘set number’ after following the above commands.
To turn off the numbering type command ‘set nonumber’ after following the above commands.

  • :4,12d -Lines 4 to 12 should be deleted from the current.
  • :5 mo 6 -Moves line 5 after line 6
  • :5,7 mo 9 -Moves lines 5 to 7 after line 9
  • :10 co 11 -Copies line 10 after line 11
  • :10-15 co 16 –Copies line 10 to 15 after line 16
  • :21,31 w file1- Writes lines 21 to 31 to file1
  • :21,31 w>>file1 –Appends file1 with lines 21 to 31

Thursday, 7 April 2016

Implementing Full-Text Indexes

In my previous blog we saw the Full-Text Index architecture. 

http://sqlandmorewithkruti.blogspot.in/2016/04/all-about-searching-full-text-indexes_7.html

In this blog we shall see its implementation. Before you proceed you need to ensure that this feature is added to your instance and the full-text services are running.





We shall start by creating a full text catalog. If multiple filegroups are available we may specify the one on which it has to be created by the FileGroup clause If multiple filegroups are available we may specify the one on which it has to be created by the FileGroup clause

Through T-SQL


USE [FULLTEXT]
GO
CREATE FULLTEXT CATALOG [FullTextCatalog]
ON FILEGROUP [PRIMARY]
WITH ACCENT_SENSITIVITY = ON
AS DEFAULT
AUTHORIZATION [dbo]
GO

Through Wizard


























Next we shall create the Full Text Index on our books table
GO
CREATE FULLTEXT INDEX ON [dbo].[Books]
( [Author] LANGUAGE English,
  [Title] LANGUAGE English,
  [Synopsis] LANGUAGE English
  )
KEY INDEX [pk_id] ON ([FullTextCatalog])
WITH (CHANGE_TRACKING AUTO, STOPLIST=NewStopList)
GO

Steps for creating the Index via the Wizard are as follows











Unlike Standard indexes we aren’t giving it any name. As SQL server allows only one Full-Text index per table the naming is taken care by SQL server. The Key Index clause specifies the unique non-null column which is mandatory when create full-text indexes.
















In the previous blog I’ve mentioned how population of the index depends on the type of Change_Tracking option selected.
When Auto is selected the changes are automatically tracked and updated by SQL server for the full-text indexed columns. 
When Manual is select the changes are tracked but the indexes aren’t updated unless the ALTER FULLTEXT INDEX  START UPDATE POPULATION is fired. This may be called through a SQL agent job as well. There is one more option which is not commonly used. 
It’s the Change_Tracking Off option when the changes are not tracked at all by SQL server. It’s available when creating full-text indexes in static tables.

























Here im creating my own StopList. This option is available under Database-> Storage -> Full Text StopList. I shall be building it from the system stop list and adding the word “describes” to StopList.






























Now we are ready with the index let’s start the querying


https://msdn.microsoft.com/en-us/library/hh245284.aspx
https://msdn.microsoft.com/en-us/library/cc879306.aspx

So coming back to the same example that we discussed in the previous blog I can now search for all the books that contain the character “Jeremy” in the synopsis as follows

SELECT * FROM [FULLTEXT].[dbo].[Books] where contains(Synopsis,N'Jeremy')

The Contains Predicate will search for the exact words or phrases that I pass. It also searches for proximity words & inflecting forms of the word. If you want more matches in terms of meaning of the words and not just the exact words we use the FreeText Predicate. It also allows for pattern matching with wildcards.


SELECT * FROM [FULLTEXT].[dbo].[Books] where freetext(*,'"new York*"')

This will result for all those records will also contain the records like new Yorker, new york in any of the columns

I could go on for providing further examples but these are very well described in the below links. 

Highly recommend that you go through these examples prior to implementing it in queries.

Books 

https://books.google.co.in/books?id=PgqEEPWRBzAC&pg=PA173&lpg=PA173&dq=full+text+indexing+sql+server+books&source=bl&ots=shtKT9rsl1&sig=C9yRRb4jbLFzcOLqIaAPWWK11gE&hl=en&sa=X&ved=0ahUKEwi2trLHr_7LAhVDto4KHUC1CKwQ6AEIUDAI#v=onepage&q=full%20text%20indexing%20sql%20server%20books&f=false

All about Searching - Full-Text Indexes

SQL server has the capability to store text-based data in the form of char, varchar, nchar, nvarchar, text, ntext, image, xml, or varbinary(max). When querying the data stored in such columns the “LIKE” clause would be highly used for pattern matching.
Consider the below table

CREATE TABLE [dbo].[Books](
       [id] [int] IDENTITY(1,1) NOT NULL,
       [Author] [varchar](20) NULL,
       [Title] [varchar](100) NULL,
       [Published_yr] [date] NOT NULL,
       [Synopsis] [nvarchar](max) NULL,
 CONSTRAINT [pk_id] PRIMARY KEY CLUSTERED
(
       [id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO 

SQL script for the table
dbo.Books.sql
It would be easier if we were filtering the rows based on authors something like this

SELECT * FROM BOOKS WHERE Author LIKE 'Nicholas%'

However consider the scenario where I wish to check if Jeremy is one of the characters in the books based on the synopsis given. My query would look something like this
SELECT * FROM BOOKS WHERE Synopsis LIKE '%Jeremy%'

It will work fine doing a table scans for the pattern match. However consider if this was a library management system consisting of millions of records. In such a scenario performing a table scan over a table is not a wise option. For such situations SQL server provides a functionality called full-text search. SQL Server's full-text search engine gives you fast queries and advanced pattern matching in an enterprise environment. It works against text data in full-text indexes by operating on words and phrases based on rules of a particular language such as English.  

To understand its architecture and how it works we need to familiar ourselves with its terminologies.
https://technet.microsoft.com/en-us/library/ms142581(v=sql.105).aspx


Full-Text Terminologies


 1) Full-Text Indexes

A special token-based index built and maintained by the Full Text Engine for SQL server. It is used to track occurrences of words or word-forms in columns containing unstructured text. Full-Text index structures are different from normal B-tree index structures. It’s an inverted, stacked, compressed index structure based on individual tokens from the text being indexed. Only one Full-Text index can be created per table. It does require a unique key to be set for each row for the given table. It helps to keep this key as small as possible. This could be a primary key. This structure is elaborated in the following link

https://technet.microsoft.com/en-us/library/ms142505(v=sql.105).aspx

2) Full-Text Catalogs

It’s a logical container for Full-Text indexes. A database can have one or more full-text catalogs. The indexes are aggregated as a collection in terms of catalog purely for administration and maintenance purpose. It is used to set common property values for all the indexes it stores.

3) Word Stemmers

For any language like English we have certain words which are considered as base or root words. From these new-words or its conjugates can be formed. Consider the word “Eat” from this we can derive words like “eating, will eat, shall eat, ate, etc” these are its derived forms. Such words are called stemmers. Identifying the word stemmers comes in handy for pattern searching while using full-text indexes.

4) Word Breakers

Any characters which delimit sentences or phrases for the language are called word breakers (Spaces Excluded). They are particular for each supported language in SQL server. Once identified further action can be taken in building the FTS index or processing the query.

5) StopWords and StopList

Every language has certain words like conjunctions, pro-nouns which provide no meaningful benefit when used in search criteria’s. For example words like “the, are, and, to,etc” are commonly discarded for full-text indexes. These words are called stopwords. By excluding these words the Full-Text indexes become more efficient. The indexes however do take into account the position of such words when encountered. As the name suggests StopList is a list of Stop words. SQL server lets you create & customize your own StopLists.
https://technet.microsoft.com/en-us/library/ms142551(v=sql.105).aspx

6) Population / Crawl Process

Unlike standard SQL Server indexes that are automatically maintained during data modifications full-text indexes aren’t repopulated. Creating and maintaining a full-text index involves populating the index by using a process called a population (also known as a crawl). There are 3 modes of population 

 A) Full Population - As the name suggests builds full-text indexes for all the rows of the table. Similar to that of a standard index rebuild operation. The default behavior of SQL server is to populate a new full-text index fully as soon as it is created.

B) Change Tracking-Based Population /Update Population  This uses change tracking functionality of SQL server. SQL server maintains a table which tracks modifications made to the index after its initial full population. Based on CHANGE_TRACKING property of the index whether set to AUTO or MANUAL the indexes are populated.Default behavior is Automatic Change_Tracking.

C) Incremental Timestamp-Based Population - This requires that the indexed table has a column of timestamp data type. If such a column does not exists SQL server will treat it as a full population operation. If the column exists SQL Server will update the modified rows based on the value of the time-stamped column.
https://technet.microsoft.com/en-us/library/ms142575(v=sql.105).aspx

7) Thesaurus 

Thesaurus comprises of synonyms or word expansions for any given word. Each language has its own set of defined synonyms. This is an XML file stored in file system. This broadens our search criteria to find similar words at querying time. 

Thesaurus matching occurs only for CONTAINS and CONTAINSTABLE queries that specify the FORMSOF THESAURUS clause and for FREETEXT and FREETEXTABLE queries. The default location for the thesaurus files is as follows

<SQL_Server_data_files_path>\<VersionSpecificInstance>\MSSQL\FTDATA\

C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\FTData


Beginning in SQL Server 2008, full-text search supports more than 50 diverse languages, such as English, Spanish, Chinese, Japanese, Arabic, Bengali, and Hindi. Once can check the list of supported full-text languages using the following query


SELECT * FROM sys.fulltext_languages

There are a list of system tables corresponding to fulltext which you can view for each of the above described components


















Full-Text Index Architecture















1) Client application submits a query containing Full-Text search contents

2) Query is passed to the SQL server engine comprising of  SQL Server Query Processor and Full-Text Engine along with various other components

3) The Full-Text searches are sent to Full-Text Engine during both the stage Compilation and Execution. SQL Server Query Processor is responsible for parsing, binding, optimization and execution.The Full-Text Engine receives the Full-Text part of the queries from the query processor. It works in conjunction with the SQL server query processor.

4) In order to full the requests a Full-Text engine calls for the Indexer. The indexer is responsible for evaluating the StopList and populating Full-Text Indexes which form a part of the database from SQL server 2008 onwards. Prior to this they were stored separately on file systems. StopList will also contain stopwords which are ignored during query evaluation.

5) Thesaurus is used during Full-Text query compilation and execution. Its external to the SQL server database. This file is also used by the Filer Daemon Launcher Service which is again external to the SQLservr.Exe

6) Filter Daemon Manager is responsible for monitoring the Filer Daemon Launcher Service  or Host Service.Its external to the SQL Server Engine.Inside SQL server engine the user  tables are evaluated during full-text index population or crawl process by the indexer. This  data is also used by the Filter Daemon Manager process which passes this information to  the Host Service.

7) The Filter Daemon Host is responsible for accessing, filtering, and word breaking data from tables, as well as for word breaking and stemming the query input: It comprises of

a) The protocol handler which is responsible for pulling SQL server data from memory and passing the database data to the filter.
b) The filter extracts chunks of data from the input provided by protocol handler. This is dependent on the document type. The embedded formatting is removed by the filter retaining only the textual contents and their position information. This result-set is then passed to the work-breakers. 
c)The word-breakers delimit the result-set. These word-breakers are also passed to the indexer by the Filter Daemon Launcher Service. Just like the stoplist the word-breaker are ignored 


8) During indexing process the indexer uses all of the above result-set    namely 
  • Tokenized daata from full-text crawl threads 
  • Delimited result-set from the Filter Daemon Host Service
  • Stop-list words that are to be excluded 
It then creates index-able words to inverted index fragments. All these fragments are then merged to form the full-text indexes for the tables

9) During the querying process the Full-Text Engine uses the full-text indexes generated by the indexer. The word-breakers and thesaurus are first referenced to generate all forms of query predicate. The query processor then looks-up for all these forms in the full-text indexes. The Full-Text Engine is also responsible for the optimization process much like a standard query optimization process to retrieve the data.

10) The SQL query execution results and Full-Text query execution results are combined and the final result-set is sent across the client.

In the next blog we shall see how to create Full-Text Catalogs,Indexes and how to use full-text index queries.