Service Broker–WAITFOR and Activated Queues

We’ve implemented Service Broker for handling a small portion of the transactions we want to catch and apply into our Operational Data Store. Mostly we wanted to make sure we accounted for actual DELETE operations and handled those records correctly. We can use Change Data Capture at this time on our source systems and Service Broker seemed to fit the bill. We still handle the remaining inserts and updates through SSIS.

While running Service Broker, we noticed a pretty constant CPU hum on the receiving server. That seemed odd, but we had a lot of trouble tracking it down. Regular Profiler traces didn’t show any running TSQL, “normal” Service Broker traces weren’t showing much, either. We just saw SQL Server running at a pretty constant 20% even with nothing seemingly happening.

I want to give a public thank you to Mark Hill (twitter) for doing a little extra digging and catching the root cause of this. In our tinkering with Service Broker, Queues, Activation, Stored Procs, and such we missed some very important information along the way.

We had written a stored procedure that would be used on the Receiving side of our Service Broker queue to run as the “Activated” proc. Inside the stored procedure, we included a WHILE loop to process anything that was in the queue. If nothing, exit.  That seemed pretty simple – if something comes into the queue, process it and stop when nothing is left.  End of story, right?

Sadly, this is where we missed a small, but important, fact about Service Broker. If you have an activated stored procedure that doesn’t use the WAITFOR command, Service Broker, we execute that stored proc as many times as possible looking for something to process. We looked at it, turned off the Activation on the queue and saw the CPU drop to nothing. We re-activated the queue and saw the CPU shoot up again. We tweaked the stored procedure after that and added a WAITFOR command with a timeout of 60000. After doing this, we saw the stored procedure run to process everything in the queue, then go idle, which was exactly the intended behavior in the first place.

I’m not going to pretend to be an expert on Service Broker. We had tried to code with the intention of being able to use this stored procedure as an Activated stored procedure or as one called from an external process to work through the queue. While that may be possible, it was unnecessary in our actual usage. Adding a WAITFOR command around our queue processing eliminated our extra, non-essential CPU usage and stopped trying to execute a stored procedure for no reason.

Resources

Pro SQL Server 2008 Service Broker ( Amazon | Apress )

WAITFOR (Also see this performance article – it’s the little details that get you)

Data Loads

One of the things I really like about SQL Server Integration Services is that it can load large data sets quickly, especially if you need to look up values along the way.  However, there are times that things behave in strange manners that seem hard to explain.  We were recently trying to load about 64 million rows into a warehouse-type database and kept hitting a bottleneck once we got past around 30-35 million rows.  No matter what I tried, the load would run at about 2 million rows per minute, gradually slowing down until it reached the 10,000 rows per second load speed.  I found several helpful posts, including some pointers to check the MaxRowBufferSize and similar settings.  Tweaking these helped someone who was on a memory-bound machine with a very similar problem.  I tried that and saw my initial load times improve, but still slowed down to a crawl at around the same point.

I learned more about looping through (aka “shredding”) a recordset of CustomerID values to try to go through that set.  I set up a FOR Loop to loop through 5 million records at a time, I checked for processes running against the table. Nothing seemed to help.  I had posted information on Twitter, had a co-worker looking at the package with me, and even got some replies from Paul Randal.  (Thank you for the assist, Paul – sorry I wasn’t checking my tweets more regularly that time.)

In the process above, I was challenged trying to set the CustomerID in some dynamic fashion on my source query and eventually resorted to treating the SQL Command as an expression and hacking in my variables into the WHERE clause.  It worked, but definitely felt awkward. I am more than open to some suggestions about how to use SSIS variables inside of a FOR loop as part of the OLEDB Source Command.  I also learned a little more about watching variable values – like needing to have a breakpoint set before they’ll be available.  And my co-worker pointed me to a wonderful discussion showing that DtExecUI runs only in 32-bit mode. I’d been using that as a quick/easy way to get my parameters plugged in and to keep an eye on the general progress of the load.

About the same time that Paul suggested checking my index fragmentation levels, I remember seeing a very similar behavior pattern several years ago when a co-worker was trying to update a column in the middle of a somewhat wide clustered index on a multi-million row table. The server was thrashing all over the place, trying to re-order millions of rows as this command took place.  I checked the table and sure enough, there was a wide clustered index on the table that would cause exactly that behavior.  I don’t know why I didn’t check it before other than thinking that I’d just created a test table and no indexes on it.  In retrospect, that was poor planning on my part.  For the future, I’ll remember to check clustered indexes before I try to load millions of (unsorted) rows into a new table.

It was a somewhat humbling experience, mostly because one of the basic things I know about loading lots of data was also something that I just completely ignored when doing all of my troubleshooting. Still, a little humility is a good thing and I’ve now re-learned several programming techniques in the process. The best part about this was seeing a 6 hour load process reduced down to 30 minutes, including dropping and re-creating the clustered index. That’s going to make a noticeable difference in our nightly processes.