Problem: I was doing an INSERT/SELECT via an export of several thousand rows, and needed to pull data from a table for two columns, but use a direct character string (zip code, in this case) for the third. Sounds straight-forward, except that a corresponding record/zip code did not necessarily exist in the table I was selecting rows from. So, if there was no record, it would return an empty recordset and wouldn’t insert. So, essentially, I needed a fake recordset.
In Oracle, as many other things, this is actually pretty easy. Oracle has its “DUAL” command, which essentially creates this empty recordset. In SQL Server, you don’t use a FROM and can return a recordset, but I wanted the columns from the table to be optional but could not use a left join as you can’t join a table on to nothing.
The first thought was to simply bring in another table that I knew had a record, then join on that. However, this table included almost every US zip code (including PO Box-only zips, military zips, overseas zips, etc.), so it was an enormous recordset and, not being the manager of the data, I hadn’t added an index, so its performance was horrendous. When I added this to the ColdFusion script, what was taking roughly 35 seconds now timed out after 120 seconds. I could increase the timeout, but over two minutes is somewhat unreasonable.
What I ended up doing was to just do a direct insert, then do an UPDATE/SELECT after all of the inserts were done. Not ideal, but it actually ended up being pretty fast. But that got my curious: what are other ways to do this in SQL Server, and how would they work out from a performance persepctive?
So, enter SQL’s nice Client Statistics window to tell me all about it. I did three different cases of running the base select query (the insert would be more, of course, but I didn’t want to do it) and checked the execution time of each.
- The join: As I said, the table I have to actually select from has a row count of about 41,000 rows. Not huge, but not quick either, particularly with no index. This query ended up taking 266 milliseconds.
SELECT a.zip,b.state,b.county FROM bigZipCodeTable a LEFT JOIN slightlySmallerZipCodeTable b ON b.zip=a.zip WHERE zip='85701'
- The table variable: This was not technically an option in ColdFusion scripting, since it doesn’t support this, but I had the idea to create a pseudo-temporary table using SQL’s table variable and a quick T-SQL script. I created the table definition in the declaration with one column for the zip code, then inserted a single record with the zip code I wanted to pull data for. I then selected from that and joined the table that I wanted to pull data from. Surprisingly (to me, at least), it actually ended up taking almost twice as long: 469 ms. So that’s out.
DECLARE @tbl TABLE(zip VARCHAR(5));
INSERT INTO @tbl VALUES('85701');
SELECT a.zip,b.state,b.county
FROM @tbl a
LEFT JOIN slightlySmallerZipCodeTable b ON b.zip=a.zip
- My third idea was to create a false recordset using a sub-query in the FROM clause. In my experience with sub-queries, I know their inefficiency in the WHERE clause and thought that, surely, this would not do well. But, you know what, it did okay. 156 ms, which beats out the join, but not by enough to actually let me use it in the script. It would add up, but not enough to save me from the script taking much longer than it should.
SELECT a.zip,b.state,b.county FROM (SELECT '85701' AS zip) a LEFT JOIN slightlySmallerZipCodeTable b ON b.zip=a.zip
So there ya go. I hope that helps someone out.