I'm back in Melbourne doing some performance-tuning work this week.
Yesterday's issue ended up being a caching problem in middle-tier code. These issues are surprisingly common.
The symptoms were hundreds of thousands of calls to a particular stored proc over a period of half an hour. It's a timely reminder that when you're tracing using SQL Trace calls or Profiler, it's important to avoid filtering out calls that aren't using too many resources, until you've looked at the bigger picture. For example, the logical reads, CPU, duration, etc. on each call were close to zero. No call on its own was a problem but the overall effect of the calls was staggering.
In the end, the problem was a cache timeout value set to 60 instead of 3600. The cache was meant to be flushed each hour, not each minute and the developer responsible thought the value was meant to be in minutes, not seconds.