Monday, March 26, 2012

Once only delivery (high availibiltity) ?

Hi There

I was wondering if someone could elaborate (or provide a link) on how the once only delivery works for servcie broker.

For example you have DB1 that sends messages to DB2. DB1 has corruption at 1pm, you retore the database to 12:30pm, this will have all the messages that were in this instance at 12:30 , however between 12:30 AND 1PM several successful messages were sent. These messages are in the restored DB1 queue, they get sent again?

How does service borker ensure that these messages are not processed again. The only thing i can think of is that the inititator or target keep a complete history of messages processed. But surely this "table" would get huge and slow down servcie broker if it had to check this tale every time a message is sent or received.

I cannot find much in BOL on this topic, or maybe i cannot find the topic.

Thanx

This question has no easy answer. On a typical database application you deal with the potential data loss of 'back-in-time' recovery by inspecting the application data and comparing it with some other record (web logs, or a paper trail like invoices or receipts etc) and you can probably recreate the lost data. If the data is critical for you business, you take steps to ensure such event cannot happen (e.g. you run with synchronous database mirroring, or you enforce some hardware disk mirroring).

On a Service Broker application the situation is more compl,ex, because you are now faced with a distributed state between the services involved. As you noticed, a back-in-time restore will put some conversation out of sync. Unless you roll back in time the other database as well (by restoring to an apropiate LSN), you cannot continue those conversations. The most difficult problem is when DB1 has sent a message between 12:30 and 1, DB2 has received the said message and acted upon (e.g. had printed and mailed a paper invoice). If you restore DB1 at 12:30, DB2 now has a message that DB1 had not sent. There is no possible 'history table' that can recover this situation.

How to act on such situations really depends on the business meaning of each conversation. After a restore, DB1 broker will be disabled. One would have to inspect the state of pending conversations in DB1 and compare them with the state of DB2 to see if a conversation can be allowed to continue or it has to be errored or event completely cleaned up. One possible action is to error out all restored conversations in DB1 (using ALTER DATABASE ... SET ERROR_BROKER_CONVERSATIONS) and then enable back the broker in DB1. This way any conversation between DB1 and DB2 that was still pending will be errored (and the application has to deal with the error appropiately). Any conversation that was started by DB1 between 12:30 and 1 and now only exists on DB2 will eventualy time out and error (and this is another reason why one should use the LIFETIME option in BEGIN DIALOG).

And again, if the business loss from loosing those conversations is critical, you must deploy a solution that simply does not allow this to happen (like a mirrored database solution).

HTH,
~ Remus

No comments:

Post a Comment