This kit contains the test script, testscript.pl, that I used to compile performance data in the main article about arrays in SQL. The kit also contains the test procedures that I used, as well as files with the functions that appears in the main article. The kit does not contain the test data, which is a separate download (6 MB).
This text gives you instructions on how to run and modify the test script. I give you a walkthrough of the script so that you can run your own test procedures if you like.
Note: Below I refer to line numbers. If would happen to change the script, there is a risk that the numbers in the article fall out of sync. The same would of course happen if you make changes yourself. (You may want to keep a backup copy as reference.) Take the line numbers as approximate.
You need to have the client-connectivity part of SQL Server installed. The script uses DB-Library which does not come with the MDAC.
You should have a tempdb of at least 25 MB. When I initially ran the tests on my workstations which had tempdbs smaller than 10 MB, I occasionally got some deviating methods for some methods, but these disappeared when I enlarged tempdb.
You don't need to know Perl for simple changes to the test script. For more complex changes, Perl knowledge is a prerequisite, though.
If you just want to test the methods that I discuss in my article on your local computer, then say from a command-line window:
perl testscript.pl
You can use the > operator to redirect the output to a file:
perl testscript.pl > result.txt
This runs each method in the article (except REALSLOW and SLOW$LIKE) 10 times for the string sizes Small, Medium, Large, X-Large and XX-Large. When the script has run all tests, it prints the result to standard output. You also find data for each individual run in the table listtest..timings.
If you want to run against a remote server, run more than 10 times, use different string sizes etc, continue to read on.
Note: the script employs the simplest form of error handling: as soon there is an error, it aborts execution.
As shipped the test script connects to the local SQL Server with Windows authentication. The actual connection takes place on line 12:
my $X = sql_init();
To specify another server, provide the server name as a first parameter. To use SQL authentication you need to do two things: 1) Put a comment character (this is # in Perl) on line 7:
# MSSQL::DBlib::DBSETLSECURE();
2) Pass user and password as the second and third arguments to sql_init
.
Here is how it would look with a different server and SQL authentication:
my $X = sql_init('SOMESERVER', 'sa', 'password');
Note: The script creates a database, so you must connect with a user with permissions to do that.
The script looks for a database called listtest (change this on line 20), and creates it if it does not exist. If the database exists, the script does not load any test data, or any SQL code. Thus, if you have changed any SQL code that the script loads, you need to drop the database before you run the script anew.
The script checks the current directory for files with names ending in .SQLFUN and .TESTSP. The SQLFUN files should contain functions, tables and procedures that some methods may need. This kit includes these SQLFUN files:
The functions for fixed-length intput and the Numbers table. | |
The functions for the iterative method. | |
The procedures for the SQL7 version of the iterative method. | |
The Numbers table and the functions for the table-of-numbers method | |
The function for the UNION method. |
The TESTSP files are supposed to contains the test procedure that the test script calls. The name of such a file should conform to the patter: METHOD_datatype_optype.TESTSP, where:
METHOD | The name of the method. Permitted characters are uppercase A-Z, dollars and digits. The first character must be a letter. |
datatype | Either Str or Int, depending on what sort of input list the test procedure processes. |
optype | Either UNPACK or JOIN. UNPACK means that the test procedure unpacks the input list and returns its contents as a result set with a column of the same data type as the input. JOIN means that the test procedure joins the input list with the test table, and returns a result set with a column of the opposite data type to the input list. |
The contents of a TESTSP file should be a test procedure with this interface:
CREATE PROCEDURE METHOD_datatype_optype_test @str text, @tookms int OUTPUT AS
@str is the input list, and @tookms is the execution time of the test in milliseconds. Notice that the procedure name should end in _test. For more details on how the test procedures should work see the main article. Also, use the test procedures that comes in the kit as guidance.
Note: The parameter @str must be of the type text. You cannot use varchar longer than 255 characters, nor can you use ntext. This is because the test script uses DB-Library which does not support the new and augmented data types in SQL7 and SQL2000.
The kit comes with all test procedures that I used for my tests. In the folder testsp-not-in-use you find the procedures for REALSLOW and SLOW$LIKE (to save you from running it by accident). Move test procedures forth and back to testsp-not-is-use depending on what you want to test. Notice that you cannot run EXEC$B without also running EXEC$A. If you only remove EXEC$A from the test, EXEC$B will become EXEC$A. (Refer to the main article for an explanation.)
Note that if you change a test procedure or a SQLFUN file, you need to drop the listtest database, so that the new definition is loaded!
On lines 71 to 122 the script creates these tables and loads usrdictwords.
usrdictwords is the table that holds the test data. timings is the table where the test script writes the result for each test run. Please refer to the script (lines 72-82) for the table definition. The test script also defines a stored procedure to load timings. The script bulk loads usrdictwords from the file usrdictwords.bcp.
If the database existed when the script started, the script truncates timings, but that's all.
On line 151, the scripts sets the seed to the random generator with a constant. Therefore the script will always use the same series of random numbers, and therefore generate the same test strings. This is intentional, as this permits you run a method separately, and still use the same test strings.
Line 154 is one that you may want to play with.
my @strsizes = (237, 2456, 7950, 23997, 120000);
This line controls which string sizes the script will use. The numbers 237, 2456, 7950, 23997 and 120000 are the ones called Small, Medium, Large, X-Large and XX-Large in the main article. Keep in mind that the test script stops adding to a string when the string has exceeded the length. Thus, an actual test string for size 7950 will be somewhere between 7951 and 7985 characters long. (The longest string in the test data is 31 characters, and there is some random spacing.)
Line 157 is also one you may want to change.
my $no_of_tests = 10;
This is simply how many times we will run each test. A tip is that only set this to 100, once you know that your methods work well. When testing the first time, you might even prefer a low value such as 2.
Uncomment lines 160-161 if you want all calls that the test script produces to be logged to the file sql.log.
#open T ">sql.log"; #$X->{logHandle} = \*T;
This happens on lines 168 to 216. Unless you want to revamp the test completely, just leave this code as-is. I'll explain it very briefly. The script constructs two strings $teststr and $testnumstr. The first is a list of strings, the second is a list of numbers. The strings and the numbers are taken from usrdictwords and each string and number maps to each other in the table. The script also compiles two arrays @testwords and @testnums which the script later uses to validate the result sets. On lines 198 to 216, the script selects which test string and which array to use for the test in the variable $testsp.
If you think the random spacing is corny, look at lines 174 and 175.
Not all methods can handle the generated strings as-is, and on lines 219 to 278 there are various adaptations for the methods. Here is a short summary:
Lines | Methods | What |
---|---|---|
ITER, ITER$PROC | Replace commas with space for the list of numbers. | |
TBLNUL$IL, INSERT, REALSLOW, SLOW$LIKE | Remove blanks between separators. | |
EXEC$A/B, UNION, INSERT | For a list of strings, enclose strings in quotes. | |
XMLATTR | Change strings into attribute-centred XML. | |
XMLELEM | Change strings into element-centred XML. | |
TBLNUM$IL | Skip, if string size > 7950. | |
UNION, INSERT, REALSLOW, SLOW$LIKE | Skip, if string size > 3000. | |
REALSLOW | Skip, if test run 3 or later. | |
FIX$SINGLE, FIX$MULTI, FIX$ITER | Construct a string with fixed-length elements. |
If you have a method that you want to test and that needs adaptations, you should be able to figure out how to change the code, if you use any of the adaptations above, or some simple variations of them. If you need some completely different sort of adaptations, you need some Perl knowledge.
This happens on lines 281-309. First the script determines how to invoke the stored procedure, through RPC or command-line batch. This is determined by the variables $test_rpc and test_cmdtext, which are initiliazed on lines 21 and 22 in the script. The script is shipped to test RPC only. If you test both, even runs will be with command text, and even ones will be with RPC.
The actual invocation is then either on line 296 or lines 299-306. On line 308 there is a call to verify_return (lines 337-362) which verifies that the test routine returned the expected data. The script does not stop if the verification fails, but it does print the number of differences and the first three differences. It also prints output if the test procedure returns an unexpected number of items.
The value of the verification should not be underestimated. The performance numbers are not relevant if your method does not produce the correct result.
On lines 316-324 there is a query that retrieves data from timings. The loop on lines 328-334 together with the report definition on lines 336-384 prints the output. Notice that the the report does not include the column calltype, so if you are testing both call types, you cannot tell which is which in the output. But you could copy the query from the script and run it in Query Analyzer.