Hi Johny,
Unfortunately, I do not have significant experience in load testing. One of the main reasons is that load testing is really a standalone and complex task that must be considered beforehand and planned carefully but not done during one day three days before release and none of the projects I participated in had top priority load and performance requirements.
Nevertheless, below are my thoughts on the topic.
Again, I think, that in case of load testing (almost) all figures should be treated relatively, but not absolutely (we are not talking about crashing, but about performance / response time desrease) and then compared with business requirements and operational profile. For example, if the online shop server responds within 30 seconds, does this mean that its code is not operating? No, it does not, as the response finally is send to the client. Will a client wait that long for response? Probably not, so from the business point of view the server may be considered as not operating. But what if server starts to respond so slowly only when it receives 10 000 requests per second? What load is expected from the business point of view? If the expected load is only between 2 000 and 5 000 requests per second, then maybe this situation should not be treated as a failure but instead require some additional code to be developed that will signal the client that the server is overloaded and the client should wait longer than usual?
I hope you get my idea that pure figures are nothing in load testing without combining them with business needs and operational environment.
The load testing, actually, measures the time spent between sending a request and obtaining the response from the server. That means that the server may response slowly because either the code needs optimization or the hardware limits were reached. This is where profiling software (like AQtime
http://www.automatedqa.com/products/aqtime/) and performance counters should be used on the server side to figure out which one from these two (code or hardware) is a bottleneck.
Likewise, you should analyse every failed request / response to find out why the server responded differently than it responded when recording the load test. And again, the reason may be the code or the hardware.
And only when for certain server load you have no more option to upgrade the hardware (CPU, memory, faster disks, faster buses, clustering and load balancing and so on) and developers are failing to optimize the code to make it work faster and more reliably, only then you can say that the server has reached its performance limits.
Where to look and what for? This depends on your software and hardware. The good idea is to communicate closely with development. The source of information may be profiling results, web server logs, system logs of operation system, application logs, etc.