This entry is not complete and provides just very raw thoughts.
Which protocol to pick for reliable messaging?
The following note in the msdn documentation seems to rule the Ws* binding out of the equation for the reliable messaging over load balancer.
Given basicHttpBinding can't use the reliable messaging, the above is enough for me to stick, if possible, to the TCP as the best option.
Configure NetTcpBinding to use reliable messaging.
TODO: provide a link
The pitfall I fell into was:
Either you target reliable messaging or not, timeouts play the vital role in your solution's "reliability/availability".
openTimeout, closeTimeout, receiveTimeout, session inactivityTimeout
One can be quite confused about which timeouts mean what as you can see from the following links, it is not exactly the most consistent names used:
The best explanation onto inactivity and receive timeout I've seen so far is from Nicholas Allen:
One timeout setting that gave us a hard time was a ChannelInitializationTimeout. By default it is set to 5 seconds. It is hard to realize how channel can take longer time to initialize, although the following information may give a reason:
From the above quote it follows that if client somehow fails to authenticate itself within 5 seconds (by default), connection is going to be dropped by the server. That was exactly what we saw in the client/server wcf log. I can see the possible reason in the authentication with the kerberos server to take longer time.
Very unfortunately you can not just setup this value on the NetTcpBinding. It can be only configured via a CustomBinding. It can be a good idea then to use a CustomBinding anyway, because generally, it gives you more flexibility.
Pay special attention to the reliability requirements for connections number, as reliability may require doubling of communication needs.
Transport quotas (with default values):