Does the following situation sound familiar? From one minute to the other,
your production servers grind to a halt, terse emails are complemented by
equally hectic phone calls, and the first order of business is to get back up
and running. After the dust settles, you're usually left with a pile of log
files and the assignment of figuring out what happened, why it happened, and
what to do to keep it from happening again.
A common first step is trying to reproduce what has gone wrong. More often
than not, this consumes a considerable amount of time that would be better
spent on actually fixing the problem. In this first blog post of a series, I
will present a Step-by-Step Guide to Diagnose Stuck Transactions within
minutes and show how a modern APM Solution helps to pinpoint common
production problems, without spending hours on reproducing it at first.
The Problem: Re... (more)