Load testing is a wide and established area of IT knowledge and software development practices. There are many professionals who specialize here and testing gurus ready to provide useful advices and even teach you a theory on the subject. Surprisingly, the mentioned gurus often do not agree with each other on the very basic terms used in this field.
If you search for information on load testing, most probably you will also find articles mentioning such terms as “performance testing” and “stress testing”. Are they all just synonyms? Everybody agrees that they are not, but still different sources provide different definitions for these terms.
The most confusing point is the difference between performance and load testing. Some people reasonably say that since the performance of an application can be measured without creating any load on it, the load testing is a subset of performance testing. Other variants of performance testing may include measuring various parameters that do not depend on the load at all, such as time required to render a web page in the browser, or to perform any other action on the client side.
When talking about stress testing, all agree that this is a type of testing when the server is stressed with a load above normal, and sometimes even beyond peak estimation for the tested application. However for what purpose is this done? Some say that this is just a way to check how the server responds to the rapidly growing load.
In my own opinion such mess in terms is produced by marketing efforts of companies selling testing tools. They want to satisfy expectations of every potential customer coming to their web site. That is why they are providing similar descriptions for all three types of testing mentioned above. In other words, they do not want to lose customers who understand these terms differently. This would be really a dramatic loss taking in account that all the same tools are used for all types of load testing.
Since I am not concentrated on selling anything to any particular customer right now, I have a freedom of developing a theory that would serve better understanding of the subject. So, no matter if any guru likes my classification, here it is.
Load testing. I prefer to think of load testing as of a blanket term for all other types of testing that are done under the emulated load. Basically each of them can be described and distinguishing from other ones by specifying the following test options.
- The main goal of test execution.
- The type and volume of applied load (it may be changing throughout the test).
- What parameters are measured and monitored when the test is performed.
- Additional actions performed with the tested system during the test.
Performance testing. Here I consider the performance testing only as a type of load testing. I understand that it can have a wider meaning, but I want to mention only the performance testing done under the load.
In this type of test we gradually increase the load by adding more and more virtual users to the test and check the performance parameters of the system at each test phase.
The main things we monitor are:
- Web site response time;
- Number of processed requests per second;
- Error rate.
As a result we have a graph showing performance parameters for each load level. So we can tell, for example, what response time we can expect under the estimated load. Since we also have the information on how it is changing throughout the test, we can also predict if this parameter can be improved by upgrading hardware and if it is stable.
Capacity testing. This type of test replies to the most common question in load testing: how many concurrent users the web site can handle while maintaining good response time and error rate.
Again, we add virtual users gradually, but in this case we know the performance criteria in advance and just need to check that they are observed. When the performance starts to degrade significantly or just goes below our quality standard, we make the conclusion that the capacity limit is reached.
Stress testing. Every system has a capacity limit. When the load goes beyond it, the web site can start responding very slowly and even produce errors.
The purposes of stress testing are:
- Find that limit (in this respect it is similar to the capacity test);
- Check that when it is reached, the web site handle the stress correctly: produces graceful overload notifications and does not crash;
- When the load is reduced back to regular level, the web site returns to the normal operation retaining the performance parameters.
In my opinion it is very important to mention last two goals, because they show the specificity of stress testing.
Baseline testing. It is funny to write about that type of testing, but many people do this, so to make the list complete I have to mention it too. By baseline testing people understand some testing that is performed to establish standards for future tests.
This is a bit strange, because I would recommend establishing such standards basing on your business requirements. Nevertheless I can imagine one case when such testing is really applicable. If you already have a live web site and you know that it is working more or less acceptable (you can have a good perception of it by checking cash in your pockets), you may perform baseline testing of that system to convert that perception to a more exact parameters, such as response time. After that you will be able to compare the performance of any new version of your web site with the initial data.
Endurance testing. This type of testing (also called “soak testing”) is used to check that the system can stand the load for a long time or a large number of transactions. It usually reveals various types of resource allocation problems. If a small memory leak is present in the system, it is not evident on a quick test, but will influence the performance after a long time.
For endurance testing it is recommended to use changing periodic load to provoke resource reallocation. When the test is over we should compare resource usage and performance parameters on the early stages and at the end of test.
Volume testing. If your application can upload files, upload the largest ones. If it does the search, try to generate long results. Try to maximize the amount of processed data and the complexity of each transaction. This will be a volume testing.
Note that in terms of the number of virtual users the load may remain on a regular estimated level throughout the test. We should already know the expected performance parameters for such load, so our goal is to check that they are not affected significantly by the above mentioned changes in the test data.
Spike testing. This is a test in which the load is increased and decreased very rapidly producing spikes. The goal is to check how the server responds to a very fast and significant load change. Depending on the web site implementation it may be reasonable to check this situation separately from other similar cases.
For example, imagine a scalable system that can allocate additional resources when the load is increased. While it can work perfectly with the high target volume, it may experience performance problems during resource allocation or just fail to do this correctly under such extreme load change.
Configuration testing. The goal of this type of testing is to find out how a change in software or hardware configuration affects the performance parameters under load. If your system consists of several components, you can try replacing some of them and see if the overall performance is changed. This way you can eliminate bottlenecks in your system and find optimal configuration.
Failover testing. This is a very interesting type of testing. It is performed under normal anticipated load for which we should already know the established performance parameters. After the initial “warm-up” phase we introduce an unexpected problem into the system. For example, it can be a connection problem between system components, which is easy to emulate by simply unplugging the network cable. We can also suddenly restart or disable redundant components without restarting the whole system.
In this test we monitor two things.
- How the performance parameters are affected by the introduced failure.
- What happens when the system comes back to normal conditions.
While it may be acceptable for the overall system performance to degrade temporary for a certain amount of time necessary to fix the failure, it is imperative that it is fully recovered after eliminating the problem. This is similar to the stress testing, but in this case the stress is produced not by the excessive load, but by a temporary problem inside the tested system.
Well, this is the end of my list. One more thing that is worth mentioning is that the load testing in general is not completely separate and different from other testing practices. Some think that it is only reasonable to test how an application behaves under load at the very end of the development process just before it goes in production. This is not so. Of course, any load tests should be applied only after functional testing; otherwise the results will not be correct and useful. However you can integrate various types of load tests into your regular development process and use them as part of regression testing performed on each new build or version of your web application.