Ask a Question

API Testing Mistake #2: Using Hard-Coded Test Data - Robert Schneider

In this episode of the 7 Common API Testing Mistakes series, Robert and I are talking about the usage of hard-coded data in API tests. Rob explains when it's OK to use hard-coded data and when it's better to use some data generators. Watch the video to learn how to avoid ruining tests with incorrect test data.




Robert Schneider

Robert D. Schneider is a Silicon Valley–based technology consultant and author, and managing partner at WiseClouds – a provider of training and professional services for NoSQL technologies and high performance APIs. He has supplied distributed computing design/testing, database optimization, and other technical expertise to Global 500 corporations and government agencies around the world. Clients have included, JP Morgan Chase & Co, VISA, Pearson Education, S.W.I.F.T., and the governments of the United States, Brazil, Malaysia, Mexico, Australia, and the United Kingdom.


Robert has written eight books and numerous articles on database platforms and other complex topics such as cloud computing, graph databases, business intelligence, security, and microservices. He is also a frequent organizer and presenter at major international industry events. Robert blogs at


Next time, we will analyze another common API mistake and share the solution with you. Stay tuned πŸ˜‰ Subscribe to this blog to always be in touch with the latest Software Quality Assurance news.


Watch the previous video in the 7 Common API Testing Mistakes series where we talk about focusing on the most typical API messages.




So, today, we're gonna be talking about another one of the really common mistakes that we see in having done this for quite a number of years which is to use a minimal amount of hard-coded data in your API tests.

A very common scenario that takes place all the time is that people will write their API tests in such a way that they're using 5 maybe 10 different hard-coded scenarios. Let's say, it's a hotel reservation API, and you decide to do a test of making a valid reservation and one of an invalid reservation, and one where the card payment goes through, and one where the card payment doesn't go through and that's about it. That's four. And, let's say that your API takes 10 to 30 different input parameters, and, a lot of times, they do. Maybe, you're a frequent traveler, so there will be - the room type you want, the bed type you want. If you think about the number of permutations and combinations on that it gets very large very quickly. When yet what we see people doing all the time is using a very small amount of hard-coded data for maybe those four scenarios. And, it's always the same data. It's John Smith 1-2-3 anytime any street any town USA or whatever it might be, whatever the geography is. And, they run with that every time.


So, when you do that, first of all, there's a lot of things that you're risking. You're risking testing your business logic on the back end, you're not really exercising that business logic, you're not doing any kind of corner or boundary conditions.


And, from a performance perspective, if you keep calling the API with the same hard-coded values, probably, in the backend of the API is a database of some sort. And, in that database, the records that you're pulling are going to be cached in memory. Which means every time you call this API, you're going to get back blindingly fast results, and the problem is that that's not truly what you're going to see in production. In production, you're gonna be getting randomized data from the users, and you're gonna be getting randomized data from the back-end API, and you'll get a much more realistic set of performance numbers. These are some of the reasons why using hard-coded data and small sets of it is so dangerous.



For me, it's again the question about the importance of creating prepared data for tests more carefully. Like using some data generators.



That's right, data generators are great. Obviously, ReadyAPI and SoapUI have fantastic data generators, even if you want to go beyond that because there are some limits to what they can do. And, the one that I always talk about in classes is – well, these data generators are superb, from SmartBear for example, but what you don't have, and, you really couldn't have, is cross field validation logic. So what you could have using the example we were just talking about in our conversation before the call started with Tanya, if you're generating a city and a country you could have Saint Petersburg England. You don't have this cross field logic that you can do with the data generator. But data generators such as what you get with ReadyAPI are great for generating random email addresses, random IP addresses, first name random, last name random, street names random. As long as you don't need that cross field validation.


If you do need cross field validation, you can pair the data generator with your own logical data that's meaningful in your own test cases. And then, let's say if you've got 30 fields of data going to your API, maybe, you only need 10 of those to be meaningful data, but the other 20 should be filled in with something. Use a data generator in that case plus data that's meaningful to you. And then, send lots and lots of records to your API instead of four test cases you should be having. Still four test cases, but using dynamic data you could have 20,000 iterations of the potential messages you're sending to your API. And, you're going to exercise your API much more vigorously, and you're going to find errors that you would not see if you didn't send that level of data to the API.



But if, for example, if I'm kind starting my career, and I've already created a lot of tests with hard-coded data. What is the best way to move from the tests that have this hard-coded data to automatically generating data?



Well, never throw away any good work. If you've done some hard work to build your test cases and you're very happy with them where they seem to be working okay, don't throw that out. We never advise that. But, in those kinds of circumstances, think about maybe augmenting them or adding some new test cases that are more dynamic that use data generators, that use other approaches for feeding information to your API, and then run them alongside each other. And then, gradually, add more and more data to the new tests and eventually if you want, you can decommission your old API tests, and then move completely forward with the new data generated test cases.



Thanks a lot. Community, stay tuned! Next time, we will continue reviewing other important API testing mistakes.



Thanks everybody, see you next time.



SmartBear Alumni (Retired)