Monday, March 2, 2009

Using SQL Service Broker in NET applications

I've been doing some work with SQL Service Broker lately and I thought I could speak about my experience with it. I was interested in the possibility of using NServiceBus or any other popular implementation of a Enterprise Service Bus in .NET (MassTransit, RhinoServiceBus) with an underlying transport different from Msmq and I thought SQL Service Broker could be a good candidate for the replacement.

This implementation exercise shouldn't be a very complicated task taking into account the great deal of extensibility of a framework like NServiceBus (you just need to implement an interface called ITransport to start using your own transport layer along with the bus), but unfortunately up to this point of time there is no API implementation for Service Broker in .NET framework so things get a bit trickier. Actually, while dealing with the implementation of a custom API for Service Broker I've hit some limitations/design decisions which, in my opinion, make SQL Broker not the best option for the transport layer of an Enterprise Service Bus framework.


Distributed Transactions

I don't want to get into the details about how SQL Service Broker works, but if you have ever used it, you know that messages are retrieved from queues using the TSQL instruction RECEIVE which is a not blocking call but you can combine it with the WAITFOR command for the blocking effect so you avoid polling the queue for dealing with new coming messages. The main problem with WAITFOR instruction is that it sets a transaction savepoint which makes the current transaction not suitable for enlistment; putting it in other words the message handling from the queue cannot be used in the context of a distributed transaction.

So what's the big deal? Actually, I read in msdn forums that SQL broker team wants to discourage the use of distributed transactions because of their performance hit. Besides, you might argue that you can get by without DT keeping all the work related to the message handling in the same database and so there is no need for enlistment. I'm afraid this covers some scenarios but it falls short in others, you have to bear in mind that:
  • Not every transactional resource is a SQL Server instance, it doesn't even need to be a SQL engine at all
  • Even if all your transactional resources are SQL Server instance because of scalability you might have to keep queues in a different instance from where your mainstream/business data is.
  • Regarding security you might have to use different sql user accounts to handle messages and handle application data

For all these scenarios distributed transactions are needed to provide reliable message handling and avoid situations like repeated or lost messages...Can you imagine a bank system where messages that increase/decrease account balance could be handled twice or not handled at all?

There is an alternative provided by SQL Broker that could help to get around this limitation (apart from polling) called External Activation which in few words is sort of a SQL trigger where custom logic can be placed to be executed upon messages arrival to a queue. Then the trick is to use a second queue to store notifications of new coming messages so the consumer application reads and waits from this event queue out of transaction and when a new message arrives it doesn't need to use WAITFOR to go to the main queue to receive the real message within a transaction. I think that this overcomplicates things as you have to handle two queues for what you could do with one and you have to make sure that both queues stay in sync so for example you don't face situations where there are messages outstanding in the main queue and available consumers to handle them but the messages don't get served because the event queue is empty.


Poison messages handling

An aspect that has to be always taken into account while developing a messaging solutions is the poison messages handling and the replayable message strategy. A valid strategy would be that, on the event of error during message handling, retry up to n times and after it goes over the limit the message can be sent to a dead/error message queue.

But what do we do when there is an error while handling the message, for example an external subsystem needed for processing the message is down for some time? A straight forward answer would be to rollback the current transaction so all inconsistent business work would be undone and the current message would come back to the queue so it could be handled again later when maybe the subsystem would come back (you can keep counters at message level for the retries to identify poison messages).

Well this strategy clashes with the way SQL Service Broker deals with poison messages: A queue is disabled upon 5 consecutive times transaction rollbacks when RECEIVE is used... Once the queue is disabled it cannot be used for processing until an user with administrator right enables it back. This automatic poison message detection it's not message bound, it can't be disabled and the number of consecutive times can't even be changed!
To sum up, on one hand we should always try to commit the transaction to avoid this behavior to be triggered but on the other hand we don't want to leave our business data in a inconsistent state, which leads us to split up the transaction in two: one for the message and the other for the business logic; pretty much we are back to square one with the risk of repeated/lost messages...

There are a couple of ways around that behavior though, one of them is another sort of a trigger where custom logic can be placed to be executed upon queue disabled event. The other one is to follow the strategy explained here, which in few words consists of creating a transaction savepoint (bye bye distributed transaction again) before receive is executed and rollback from that transaction savepoint in case of error, so technically the receive statement is not rollbacked.


Wrapping up

Despite the issues above, I think that this has been a good experience as I have got to know better SQL Broker and I've had the chance to dig a bit into NServiceBus internals and understand a bit more how it works under the hood.
The general idea that I get of SQL Broker is that it can be a good solution in an applications where the message handling business logic is included in the database, but for .NET applications I'd rather go for MSMQ queues which, on top of providing an API implementation in .NET framework, don't suffer from the problems that I've talked about.

I'm no expert in SQL Broker though, so if I'm missing something, don't hesitate to say it, I'll appreciate it!


kick it on DotNetKicks.com

5 comments:

  1. Great post. I am considering SSB for an upcoming project and one area I am concerned about is poision/dead letter msg handling. Appreciate the info.

    ReplyDelete
  2. There is a working transport in rhino-esb for SQL Service Broker here. http://github.com/CoreyKaylor/rhino-esb/tree/servicebroker

    ReplyDelete
  3. Javi,
    Thank you for sharing your experience (and pain). IMHO Problem with duplicated messages is solvable as long as you generates an unique message ID and store processed ones, so when duplicated message arrives it's just a simple sanity check. Obviously it's a separate piece of infrastructure and it's your headache for archiving processed messages in a timely fashion (day/week/month).

    I might be a old-fashioned but my idea of using SSSB was to delegate something that should not be part of ANY transaction in a first place (like logging).

    ReplyDelete
  4. SQL Server 2008 R2 queue definition contains point about handling poison messages:

    CREATE QUEUE ...
    POISON_MESSAGE_HANDLING (STATUS = { ON | OFF })

    Poisoned messages detection simplified algorithm with this option POISON_MESSAGE_HANDLING (STATUS = OFF):

    set xact_abort off;

    begin try

    receive message from queue;

    -- process poison message
    if exists(select * from dbo.poisoned_messages where [conversation_handle] = @conversation_handle) begin

    exec process_poisoned_message;
    commit transaction;

    end else begin

    exec process_regular_messages;
    commit transaction;

    end;

    end try
    begin catch

    -- rollback uncommitable transaction
    1. if (xact_state() = -1) rollback transaction;
    --
    2. if (xact_state() = 0)
    insert dbo.poisoned_messages([conversation_handle]) values(@conversation_handle);
    -- failed message processing
    3. if (xact_state()=1) begin
    end conversation with error ...;
    commit transaction;
    end;

    end catch;

    ReplyDelete
  5. Hi,

    In my service bus project ("RockBus"), I combine SSB and WCF by using the ssb binding for wcf (ssbwcf.codeplex.com). http://rockbus.codeplex.com.

    HTH,
    Bram.

    ReplyDelete