Wednesday, December 3, 2008

A legitimate use for temporary tables

Temp tables are those monstrosities that start with # in SQL Server. They exist in a netherworld in between your database and the TempDB database, and they only exist for the current connection...unless you're crazy enough to use a ## temp table, then they're in a global TempDB netherworld...then they mysteriously break your code.

In general, these things are rightfully maligned because they usually just cause way more trouble than they're worth, especially when an error occurs in the database that somehow ends up with the temp table not getting DROPped. Conventional wisdom says to use table-valued variables instead...and yet, if you're trying to load half-a-million rows into a table variable, you're hosed.

Here's the scenario where the evil little buggers actually work: bulk loading a flat file into a relational structure through TSQL because SSIS is (still and always will be) crap:

  • Create an XML Format File to describe your flat file to the DB server

  • Create a temp table wiht columns that you like from your file

  • INSERT INTO #EvilTempMonster ... FROM OPENROWSET(BULK 'file name goes here' FORMATFILE='format file name goes here')

  • Place indexes on your temp table

  • Load the relational structures to your heart's content.

  • Drop the temp table



Do your bulk load outside of a transaction (you aren't going to modify your file from inside SQL), but do all of your relational INSERT/UPDATE stuff in a transaction, wrapped in a TRY...CATCH block, and on catch, ROLLBACK.

It should look something like this:

DECLARE @ErrorMessage nvarchar(MAX)

--Just make sure your temp table can't somehow
-- screw with you and ruin your day.
IF object_id('tempdb..#MyFlatFile') IS NULL
DROP TABLE #MyFlatFile

CREATE TABLE #MyFlatFile(...columns rock...)

--Mmmm, much better than BULK INSERT.
--Even if you have to make this dynamic SQL to
-- vary the file name and format file name.
INSERT INTO #MyFlatFile(...columns again...)
SELECT ...flat file column list...
FROM OPENROWSET(BULK 'filename.txt' FORMATFILE='formatfile.xml')

CREATE CLUSTERED INDEX IX_ImportantMatchField
ON #MyFlatFile (ImportantId)

BEGIN TRY
BEGIN TRANSACTION

INSERT INTO TableA
FROM #MyFlatFile
WHERE #MyFlatFile.ImportantId NOT IN (SELECT DISTINCT MyImportantId FROM TableA)

...Other inserts/updates...

COMMIT
END TRY
BEGIN CATCH
ROLLBACK
SET @ErrorMessage = ERROR_MESSAGE()
END CATCH

DROP TABLE #MyFlatFile

IF @ErrorMessage IS NOT NULL
RAISERROR(@ErrorMessage, 16, 1)

At least, that's one way to do it. You could just do the DDL and bulk load inside the transaction too, and that would be easier on the programmer, I'm just unsure of the performance implications.

With my script, I can load half-a-million rows into 3 different related tables, reducing out duplication, scrubbing out double-quotes (thanks Excel), inserting new rows and updating existing rows in under 3 minutes on what is a pretty "slow" development server.

Why go through all of this when SSIS has a pretty graphical designer? My IS packages would not execute in the context of my web server, running under a least-privileged-user account (whoops!). Furthermore, I could not use transactions within the IS packages because of permissions problems involving the Distributed Transaction Coordinator. WTF is it using that for? SSIS COM at the core. Awww, yeah! That's what I'm talkin' about!

Has anybody else found a use for temp tables that made you move them from the "Evil-Never Use" category to the "Correct tool for some circumstances" category?