Creating CSV Files Using BCP and Stored Procedures

This article focuses on the use of the Bulk Copy Program (BCP) to create CSV files. Although it is possible to create a CSV file using DTS or SSIS, using BCP is often simpler and more efficient.

I use master..sysobjects as an example table to extract.

Create a simple CSV file

The simplest way to copy data from a database table to file is to use the basic BCP command:

The basic format for the BCP command for creating a CSV file is as follows:

BCP <table> out <filename> <switches>

The switches used here are:

  • -c Output in ASCII with the default field terminator (tab) and row terminator (crlf)
  • -t override the field terminator with “,”
  • -T use a trusted connection. Note that U -P may be used for username/password
  • -S connect to this server to execute the command

Note that, like DTS/SSIS, BCP is a client utility, hence you need to supply the connection information.

For transfer of data between SQL servers, in place of -c, use -n or -N for native data format (-N = Unicode). This is much faster and avoids data conversion problems. Please refer to the previous BOL link for the complete format of the BCP command.

As BCP is a command line utility it is executed from T-SQL using xp_cmdshell. Create a directory called BCP on your c: drive and execute:

Other field and row delimiters

Often, character data includes commas which will be interpreted in the file as a field terminator. A simple way to cater for this is to use a different field delimiter which does not appear in the data. Commonly used characters are pipe (|):

And caret (^):

The terminators are not limited to a single character, so if necessary you can use (|^):

Note that this will increase the size of the file and so slow down the import/export. Another way to cater for embedded commas is to “quote encapsulate” character data – more about that later.

It is unusual to need to change the row terminator from the default of crlf (carriage return, line feed) but occasionally you will need the cr or lf on its own.

To do this use the hex value cr = 13 = 0x0Dlf = 10 = 0x0A. If you can’t remember these values they are easily obtained, as follows:

To use these in the BCP command for the row terminator the -r switch is used:

When the resulting file is opened in notepad the row terminators will not end the line – the character should appear as a non-ASCII character (a little oblong). If opened or copied to query analyser (or management studio) these will be interpreted as line breaks and the file will be more readable.

The terminator characters can be encapsulated with double quotes in the command to allow for space. I like to do this for anything other than a single character delimiter. The following example gives a crlf row terminator and | ^ field terminator.

Formatting the extracted data

If you do not require all the columns/rows, or you wish to alter the data, a simple method is to create a view. This can be used in the BCP command in the same way as a table. The following view allows you to extract just the name, the formatted create date and time and then order the results according the date created (most recent first):

Now we can quote encapsulate the extracted data by including the formatting in the view:

Note:
Quote encapsulation can also be performed by use of a format file.

Complete control – stored procedures

If you examined the BCP syntax in BOL, you might have noticed that it is possible to extract from a query by using the queryout keyword in place of out.

So, for example, an equivalent but neater version of the previous code extract would place the ORDERBY clause in the BCP statement rather than the view:

Of course, strictly speaking, the view was not necessary at all as the query could have been included in the BCP statement but that can get difficult to maintain.

The queryout method allows you to BCP from the result of a stored procedure, which opens up a lot of possibilities and offers a lot of control over the file format. For anything other than a simple table extract I would tend to use this method rather than a view. I would also format each line within the stored procedure. This means that the formatting can be tested independently from the file creation.

Employing this technique, the above extract becomes:

More complex formatting

We can now change the format of the extracted data to anything we require. For example, to include column names at the top and a rowcount at the end:

This stored procedure will format the data as required, but when it is run via the BCP command it receives an error:

“Invalid object name ‘#a’”

Using the profiler you can see that when using queryout, the query is called twice. BCP tries to get the result set format by calling the stored procedure with fmtonly. This does not execute the stored procedure but returns the format of the resultset. Unfortunately, it does not work if the stored procedure creates and uses a temp table. This problem can be circumvented by including a set fmtonly offcommand in the execution:

Note that the column terminator is not used since the resultset is a single column.

Editor’s Note
This was the only piece of code that I failed to get working. The stored procedure executed fine outside the BCP, but when I tried to run the above command I received an error:

SQLState = HY010, NativeError = 0
Error = [Microsoft][SQL Native Client]Function sequence error
NULL

The author was unable to replicate this error. Is anyone aware of what might be causing this on my machine?
End Editor’s Note

Look at this execution using the profiler and you will see set fmtonly off followed by set fmtonly on. There is a potential problem with this though: it means that the stored procedure is executed twice, once for the format and once to extract the data – due to the fmtonly setting, both these calls will produce a resultset. This needs to be taken into account when considering how long the process will take and its impact on your system.

As the stored procedure is executed twice it should not contain any updates of permanent data. Particularly if data is flagged as exported then the flagging will be performed on the first call and no data will be extracted for the second call. All in all, this method should be used with caution and comments (warnings) placed in the stored procedure for the aid of future maintenance.

To get round all of these problems, use the stored procedure to insert into a table then, from the BCP, extract from that table. This gives you added features that can be valuable on systems that are performing a lot of extracts:

  • It gives a record of the data extracted – if the file is lost it can be recreated.
  • The data extracted can be presented to a user for viewing from the database.
  • If there are problems with the file, the support can see the data without needing access to the file.

For this process, the format stored procedure, s_bcpMasterSysobjects, will insert into the table using an identifier and the BCP will extract using the same identifier. This process can be controlled be (a control) stored procedure which would allocate the identifier and pass it to the format stored procedure which inserted the extracted data into a table with that identifier. It then calls another stored procedure or in-line code to create files (via BCP) for each identifier which is not marked as complete. The identifier is flagged as complete after the file creation. In this way the file creation becomes independent of the extract of the data.

This identifier can be passed to both processes from a controlling stored procedure or the format stored procedure can get the next available identifier and the extract can flag the identifier as extracted – so it extracts everything that has not been extracted, allowing the export to be independent of the extract.

Which technique you use will depend on your system. The following code demonstrates the use of a controlling stored procedure:

Now the data is extracted via:

You can view the data extracted via:

Alternatively, you can use the following, which will also give the date that the data was extracted:

Note:
For the extract I have given three sort columns Seq1, Seq2, Seq3 which are all order ascending. This means that the extracting stored procedure needs to place the values in ascending order (see the date manipulation). This could have ascending and descending sort columns or the use could be dependent on the extract type.

For a system which is performing a lot of small exports I will usually implement this process, and a similar process for imports.

Extracting all tables from a database

This is usually performed to transfer data between servers/databases and so should use the native data format. The easiest way is to create the BCP statements from a query then execute them:

The result of this will be a series of BCP statements of the form:

which will extract all tables in the database. To import just change the “out” to “in”.

Summary

We have seen how to use BCP to perform a simple extract of a table to a CSV file, how to perform slightly more complex extracts using a view, extracting using a stored procedure to format the data and a system which allows the data to be extracted and saved in a table and viewed/exported from there.

Advertisements

SQL: How to find out who truncated your table in SQL Server

Someone truncated your table. You would like to get your data back, but mostly you really just want to know who did it.

Getting the data back can be as easy as restoring a backup. But having a backup does not tell you who the culprit is.

This information can however be gleaned from the SQL Server Transaction Log. If your database is in simple recovery model you’d better be quick about it, so that you can get to the information before the log gets reused.

The following query will show you who the culprit is based on data in the active transaction log:

SELECT [Current LSN],[Operation],[Context], [Transaction ID],[Transaction Name],SUSER_SNAME ([Transaction SID])[Culprit]

FROM fn_dblog (NULL,NULL)

WHERE [Transaction Name]=’TRUNCATE TABLE’

Unfortunately it only shows you that the user truncated atable but not exactly which table. So how can we determine which table this particular action relates to?

We know that SQL Server can create multiple entries in the transaction log for even the simplest operation. So we can look at all the entries with the same Transaction ID to determine exactly what happened.

SELECT[Current LSN],[Operation],[Context],[Transaction ID],[Transaction Name],SUSER_SNAME ([Transaction SID])[Culprit],[Lock Information]

FROM fn_dblog (NULL,NULL)

WHERE[Transaction ID]= (Select[Transaction ID]FROMfn_dblog(null,null)WHERE[Transaction Name]=’TRUNCATE TABLE’)

GO

Here I am including the Lock Information column. Since we know that in order to truncate a table a Sch-M (Schema Modification) lock will be taken, we can use that to determine which object it was taken on.

In this case the object id is 885578193. We can then use the following to obtain the name of the table which was truncated in that transaction.

SELECT OBJECT_NAME(885578193)

If the data is no longer available in the online transaction log, and you are using the FULL recovery model and have log backups, you can use the same method with the fn_dump_dblog function instead. This will allow you to not only read the online transaction log but also log backups.

******The Correct Query : WHERE[Transaction ID] IN (Select[Transaction ID]FROM fn_dblog (null,null)WHERE[Transaction Name]=’TRUNCATE TABLE’)

SQL Query: How to retrieve the Table columns from a stored procedure

SELECT
name AS ColumnName
,system_type_name AS DataType
,CASE
WHEN is_nullable=0 THEN ‘NOT NULL’
WHEN is_nullable=1 THEN ‘NULL’
END AS IS_NULLABLE
,column_ordinal AS OrdinalPosition
,(‘Col_’ + CAST(column_ordinal AS varchar(4))) AS Alias
FROM sys.dm_exec_describe_first_result_set
(N'[dbo.Registration]’, null, 0) ;

 

SQL SERVER – How to retrieve the metadata of a stored procedure

In my earlier articles, I wrote about sys.dm_exec_describe_first_result_set
(A new dynamic management view shipped with SQL Server 2012), it is very handy when you need to know the metadata of the first possible result set of any Transact SQL. Today, I came across an issue when I tried to get the metadata of a stored procedure in earlier version of SQL Server and wanted to use the list of columns for further manipulations. However, I could not find a proper solution like sys.dm_exec_describe_first_result_set in the earlier version of SQL Server. So I started developing the solution.

Given below are the solution.

SOLUTION 1 : For SQL Server 2012 and above using sys.dm_exec_describe_first_result_set

1
2
3
4
5
USE AdventureWorks2012;
GO
SELECT FROM sys.dm_exec_describe_first_result_set
(N'[dbo].[uspGetManagerEmployees] 16', null, 0) ;
 --OUTPUT

metadata of stored procedure.1.1

SOLUTION 2 : For SQL Server 2005 and above using OPENROWSET

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
USE AdventureWorks2012
GO
--DROP TABLE #temp1
GO
SELECT * INTO  #temp1
FROM OPENROWSET('SQLNCLI', 'Server=(local);uid=sa;pwd=test',
'EXEC AdventureWorks2012.[dbo].[uspGetManagerEmployees] 16');
GO
SELECT * FROM tempdb.sys.columns
WHERE object_id=object_id('tempdb..#temp1')
GO
--View the column name / metadata of the
--above created temporary table.
--OUTPUT

metadata of stored procedure.1.2

SQL Query: Schedule multiple jobs to run over night

I had about 40 sql agent jobs. I had to schedule them to run one time only over night. It was a bit of hectic to open every job, change the ownership, modify the schedule time, then change the ownership back to the service account. In order to minimize the work effort, I have written a script based on CURSOR. Where I will have a control table, it will define the job name, schedule name, schedule start date, schedule start time. I will read this from the control table and the change the schedule accordingly.

 

USE msdb;USE msdb;go
/*create the control table and insert some dummy data.In order to make this script work, there has to be job exist defined in the control table.*/
CREATE TABLE [ETL].[SqlJobControl]( [SqlJobControlID] [int] IDENTITY(1,1) NOT NULL, [job_id] [int] NULL, [job_name] [nvarchar](200) NULL, [schedule_name] [nvarchar](50) NULL, [job_description] [nvarchar](100) NULL, [active_start_date] [int] NULL, [active_start_time] [int] NULL, [owner_login_name_user] [nvarchar](50) NULL, [owner_login_name_service_account] [nvarchar](50) NULL, [enabled] [bit] NULL, CONSTRAINT [PK__SqlJobCo__D32BB6AA03DE9823] PRIMARY KEY CLUSTERED ( [SqlJobControlID] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]
GOSET IDENTITY_INSERT [ETL].[SqlJobControl] ON
INSERT [ETL].[SqlJobControl] ([SqlJobControlID], [job_id], [job_name], [schedule_name], [job_description], [active_start_date], [active_start_time], [owner_login_name_user], [owner_login_name_service_account], [enabled]) VALUES (1, NULL, N’Developer_Test_Job1′, N’OneOff’, N’Dummy job’, 20170613, 175500, N’blue\ahmedma’, N’ELC\srvc.DWHSQLAgent’, 1)INSERT [ETL].[SqlJobControl] ([SqlJobControlID], [job_id], [job_name], [schedule_name], [job_description], [active_start_date], [active_start_time], [owner_login_name_user], [owner_login_name_service_account], [enabled]) VALUES (2, NULL, N’Developer_Test_Job2′, N’OneOff’, N’Dummy job’, 20170614, 180000, N’blue\ahmedma’, N’ELC\srvc.DWHSQLAgent’, 1)SET IDENTITY_INSERT [ETL].[SqlJobControl] OFF;
DECLARE  @job_id int, @job_name nvarchar(200), @schedule_name nvarchar(50), @job_description nvarchar(100), @active_start_date int, @active_start_time int, @owner_login_name_user nvarchar(50), @owner_login_name_service_account nvarchar(50), @enabled bit
;
DECLARE @oneoff_sqljob_cursor AS CURSOR;
SET @oneoff_sqljob_cursor = CURSOR FAST_FORWARD FOR SELECT  [job_id] ,[job_name] ,[schedule_name] ,[job_description] ,[active_start_date] ,[active_start_time] ,[owner_login_name_user] ,[owner_login_name_service_account] ,[enabled] FROM [UNIFY2].[ETL].[SqlJobControl] WHERE [enabled]=1;
OPEN @oneoff_sqljob_cursor;FETCH NEXT FROM @oneoff_sqljob_cursor INTO  @job_id ,@job_name ,@schedule_name ,@job_description ,@active_start_date ,@active_start_time ,@owner_login_name_user ,@owner_login_name_service_account ,@enabled ;
WHILE @@FETCH_STATUS = 0BEGIN EXEC [dbo].[sp_update_job_for_developers]   @job_name=@job_name,  @owner_login_name=@owner_login_name_user; EXEC msdb.dbo.sp_detach_schedule  @job_name=@job_name,  @schedule_name=@schedule_name, @delete_unused_schedule=1 ;
EXEC msdb.dbo.sp_add_jobschedule  @job_name=@job_name,  @name=@schedule_name,  @enabled=1,  @freq_type=1,  @freq_interval=1,  @freq_subday_type=0,  @freq_subday_interval=0,  @freq_relative_interval=0,  @freq_recurrence_factor=1,  @active_start_date=@active_start_date,  @active_start_time=@active_start_time ; EXEC [dbo].[sp_update_job_for_developers]   @job_name=@job_name,  @owner_login_name=@owner_login_name_service_account; FETCH NEXT FROM @oneoff_sqljob_cursor INTO  @job_id ,@job_name ,@schedule_name ,@job_description ,@active_start_date ,@active_start_time ,@owner_login_name_user ,@owner_login_name_service_account ,@enabled ;END

SQL Theory: How to Store (and Retrieve) Non-English Characters (e.g. Hindi, Czech, Arabic etc.) in SQL Server

If you have to store and retrieve characters of any other language besides English in SQL Server, you must do the following –

  1. Use a Unicode compatible data type for the table column. NVACHAR, NCHAR, NTEXT are the datatypes in SQL Server that can be used for storing non-English characters.
  2. Precede the Unicode data values with an N (capital letter) to let the SQL Server know that the following data is from Unicode character set. Without the N prefix, the string is converted to the default code page of the database. This default code page may not recognize certain characters.
  3. The N should be used even in the WHERE clause.

REFERENCE: Microsoft Support KB 239530
You must precede all Unicode strings with a prefix N when you deal with Unicode string constants in SQL Server
http://support.microsoft.com/kb/239530

If the correct data-type is not used or the data is not preceded with an N, SQL Server will save the data to the table as ‘?’ or another garbled character.

The following scripts demonstrate saving and retrieving multi-lingual data to and from SQL Server. I have used Google Translate to get the characters of other languages. I left out far-east languages like Japanese and Chinese from the following example on purpose because those languages have a few other considerations that I’ll save for another blog post.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
DROP TABLE dbo.unicodeData;
GO
CREATE TABLE dbo.unicodeData
( languageUsed VARCHAR(50)
, unicodeData NVARCHAR(200)
, nonUnicodeData VARCHAR(200) -- same data in a normal VARCHAR column for comparison
, comments VARCHAR(100)
);
GO
INSERT INTO dbo.unicodeData
( languageUsed
, unicodeData
, nonUnicodeData
, comments)
VALUES
('English'
, N'This is an example'
, N'This is an example'
, NULL)
,('Hindi'
, N'यह एक उदाहरण है.'
, N'यह एक उदाहरण है.'
, 'Using the preceding N in both strings but VARCHAR is still a ?')
,('Hindi'
, 'यह एक उदाहरण है.'
, 'यह एक उदाहरण है.'
, 'Not using the preceding N in both strings so both are a ?')
,('Kannada'
, N'ಈ ಒಂದು ಉದಾಹರಣೆಯಾಗಿದೆ.'
, N'ಈ ಒಂದು ಉದಾಹರಣೆಯಾಗಿದೆ.'
, NULL)
,('Arabic'
, N'هذا مثال على ذلك.'
, N'هذا مثال على ذلك.'
, NULL)
,('Czech'
, N'To je příklad.'
, N'To je příklad.'
, NULL);
GO
SELECT *
FROM dbo.unicodeData;
GO
-- Example of using N' in the WHERE clause
SELECT *
FROM dbo.unicodeData
WHERE unicodeData like N'%एक%';
Unicode Results
Unicode Results

Further Reading: