I've been committed for quite a while now to have some solid study on the production debugging. As I'm hang a little bit between 2 jobs right now I finally got time to realize it.
So, here I'm to describe my progress to the friends and share some bits and pieces I found interesting.
Long time ago I started with John Robbin's book: Debugging Microsoft .NET 2.0 Applications. As of that time I was not really into WinDbg debugging (I was interested, but not really committed), so that section from the book really escaped from me. But I benefited quite a lot from some of the other advanced debugging concepts provided there.
This time, I started with series of labs that Tess accumulated on her blog: http://blogs.msdn.com/tess/pages/net-debugging-demos-information-and-setup-instructions.aspx
I made a mistake for the first lab that I looked into the code of BuggyBits web site, so it was not really a "eureka" type of troubleshooting for me.
Nevertheless, after the first lab I considered reading this book as quite beneficial:http://www.microsoft.com/downloads/details.aspx?FamilyId=1644C14D-4152-4975-B1C2-A81BDFD6C30F&displaylang=en - Production Debugging .NET applications.
To give you some taste, few interesting bits I found there:
By default, events are only logged in the application event log if the recycling occurs based on the memory limit trigger. If you want IIS to log all
proactive recycling events to the application event log, run the following command from the \Inetpub\Adminscripts directory:
cscript adsutil.vbs set w3svc/apppools/<defaultapppool>/LogEventOnRecycle 0xffffffff
You can reconfigure WAS to prevent the failed process from serving any more requests, while ensuring that the process continues to run; this is known as orphaning the failed process. An orphaned worker process is removed from the application pool, but left in its failed state for later debugging. In this case, the WAS starts another worker process so that the application can continue to serve requests. To configure WAS to orphan worker processes upon failure, run the following command from the \Inetpub\Adminscripts directory:
cscript adsutil.vbs set w3svc/apppools/orphanworkerprocess 1
This command applies this setting at the master level for all application pools. To apply this setting to a specific application pool only, run the command as follows:
cscript adsutil.vbs set w3svc/apppools/<nameofapppool>/orphanworkerprocess 1
Large Object Heap Size displays the current size, in bytes, of the large object heap. Note that this counter is updated at the end of a garbage collection, not at each allocation.
And many more! In the middle of my reading, I returned to the second lab of Tess's and progressed with labs execution, this time always looking in the debugger first, not the code :)!
Side note - let me stop for a bit to describe my lab environments:
1. Windows Vista Home Premium (IIS7) - my very own laptop
2. Windows 2003 R2 Server Enterprise Edition - evaluation edition available as a virtual image from MS downloads, running on my laptop again (ignore all of the VPC compatibility message boxes, it just works)
I thought I'd have issues with running labs on Windows Vista Home Edition, but as for my second day now I have not encountered any real problems:
- I had to set the app pool for ByggyBits to run under local system, as I didn't want to be figuring out why it can't really access the asp.net temp folder (as some are saying there is genuine limitation in Vista Home not using windows impersonation/authentication).
- I had to install IIS6 resource kit on my W2003R2 virtual image and rip tinyget out of it, as I was not able to find IIS7 resource kit or tinyget alone. IIS6 resource kit denied to install on Vista Home machine.
So at some point I dumped the W2003R2 machine and continued on Vista Home Premium.
So the second lab Tess provides the following message for the process crash in the event log:
A process serving application pool 'DefaultAppPool' terminated unexpectedly. The process id was '4592'. The process exit code was '0xe0434f4d'. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
As I have never seen before what is logged on IIS7 for the same crash conditions, I was quite pleased seeing the following helpful message:
An unhandled exception occurred and the process was terminated. Application ID: /LM/W3SVC/1/ROOT/BuggyBits Process ID: 6068 Exception: System.NullReferenceException Message: Object reference not set to an instance of an object. StackTrace: at Review.Finalize()
That provides much better info! So answering Q3 from section 1 ("Q: Can you tell from the eventlogs what it was that caused the crash?") becomes pretty possible at this stage already: It is an exception in the finalizer, it runs in the non-request thread so it will bring down the process per .NET 2.0 unhandled exceptions policy. Does anyone from the .NET team added AppDomain.CurrentDomain.UnhandledException logging handler :):)?
Compared to the walkthrough, on vista I was not able to find a re-thrown exception different from the original, if anything was re-thrown then it was the original exception in my opinion. The !analyze command gave me the most of the information (and the stack trace was in the _remoteStackTraceString field).
From the time perspective (not counting the book), it took me about 4 hours to get to this point (setting up the environment, VM, etc). Enough for a day!