[Day 1/12] ReadyAPI Data Driven Tests - Treasure for API Testing





Functional API testing in its nutshell is calling methods of your API with some input parameters in some specific order and comparing the actual output with the one, which you expect for the provided parameter and call sequence.


To ensure the API quality, you typically would want to evaluate it with lots of combinations of parameters – both valid and invalid – which can hardly be performed manually. This is a place where the data-driven testing (DDT) strategy may naturally come into play. This strategy separates data (sets of parameters and predicted responses) from the test logic, and it is perfect for being automated.


An automated data-driven test always includes these two components:

  • A loop through a data source — data can be prepared beforehand and stored in Excel spreadsheets, DB tables, CSV files, etc., or generated on-the-fly.
  • A parameterized functional test within the loop, where data from the source is fed to requests and automated assertions.


The first question you would probably ask is ‘how to prepare data for my data-driven test’. There are several approaches to this:

  • Prepare data manually. The resulting set can have various data to address typical and edge usage scenarios. But, this approach requires a lot of time and efforts, as well as good understanding of the API business logic.
  • Generate synthetic data based on some patterns. This can be performed relatively easily with some special tools (at that, you can generate a data set as large as you need), though, the integrity of data is not guaranteed.
  • Get production data. Such data is truly realistic, diversified, and consistent, but, in some cases, it cannot be provided to the QA team due to privacy concerns, or, if it can, someone needs to care about obfuscating sensitive data before using it as test data.


Whatever approach or combination of approaches you choose, here are the types of data sets your DDT should typically include to ensure a good coverage:

  • Valid values.
  • Invalid values:
    • Values of invalid types (e.g., specify a string in a field, where an integer is expected).
    • Empty values for required fields.
    • Values, exceeding the limits, including extremely large values.
    • Invalid combinations of values (e.g., an “end date” is before a “start date”).
  • Boundary values.

In conclusion, the decision on whether to use or not to use data-driven testing depends on your specific situation and business requirements. But, generally, an automated DDT test with thoroughly and competently prepared testing data allows quickly testing your API using a huge number of input data combinations, which guarantees that your API is ready to be used in our real “data-driven” world.


Further reading:


It's your time to participate

We'll love to hear about your experience with data-driven API testing:

  • Do you use data-driven API testing in your organization? If you haven't used it yet, have you evaluated this approach?
  • How do you generate data for your tests? Which type of data do you use?
  • What issues did you find in your tested API thanks to data-driven testing?


As a reminder, each day of the event, we will choose a winner for the best answer. You can find all the details here

Community Hero

 I use data driven testing to cover a variety of scenarios, from what data is or is not there, to also ensure that my data is random and fresh every time to overcome uniqueness-based business rules. Depending on the size of the service I am testing (and because I have to often test a corresponding read service), if it is a smaller service I use a custom groovy script I built that allows great flexibility of the data that I need. It covers every sort of data type that I need: strings with patterns, without patterns, dates, date-times, etc. I store this script and a Properties step in a test case, and reference them there with Property Expanion.


I also wrote a groovy script that generates what I call NULL test steps for me. This is for testing the various parts of my requests that are allowed to either be empty strings or are optional elements. The script also tests for schema-validity and discards any schema invalid tests since they would never reach our tests due to a schema-validator we have in place. Depending on how many tests are built from this, it can greatly increase our amounts of tests. Which means the groovy script above for generating data can grow the Properties step quite large. If that happens, I tailor the NULL tests to use a data source and can easily modify the script above to use in a Groovy data source.


If you haven't guessed by now, I groovy a lot. Smiley Wink I have another script that can change my NULL tests to the appropriate property expansion as needed when I swap to the groovy data source. 

The biggest issue I've ran into here is that as the services I test grow larger and larger, the data I need to generate grows quickly. One of my current services has 100,000+ properties being generated to test all aspects of the service. And that's not including the NULL tests! I often find myself pushing ReadyAPI / SoapUI NG Pro to its limits with memory. I optimize everything where I can, and even have learned a few tips and tricks for reclaiming memory. (Mind you, I'm still stuck on version 1.9 so some of this may be out of date.) I include groovy script to do my own garbage collection and run that as necessary as I can.

Even with that, I am often pushing 30GB of memory usage due to how much data I may have to process at any given time. 

Super Contributor

How can I compete with you man @msiadak Smiley Happy

Well I had done data driven tests even before using READY API

We would be using python to parse out these csv or excel files

Absolutely impressive but whats more impressive is READY API makes this easy out of the box.

This is the reason I switched from SOAP UI to using READY API.

Key points to remember for data drive tests

It covers one flow or one scenario so its make it easy when carrying out scenario based tests.

where each scenario can be for one unique flow.


It can cover lots of combinations of data values hence data driven tests!


We just recently finished a projected where different scenarios needed to be covered

for eg rule for flight filters


If using SOAP UI free version you are looking at 100s of tests which can be easily accomplished in three steps

rule of thumb always use csvs where possible for your data source.

This is much smaller in file size than excel files and if diff is required when using version control,

its easier with csvs than with excel files.

Now if test involves multiple scenarios I use excel where each worksheet corresponds to a test scenario.

This way I have all scenarios for a test in one place!

One thing I forgot to mention is using multiple daatasource and dataloops to carry out testing over multiple data sets

its like have two while or foor lops going over various combinations of data!

I tried and it worked!


I always wanted to point out facts on DDT.

Occasional Contributor

I am also using DDT (to our test DB which is "refreshed" on an as needed basis) as our approach with Ready API. Using this approach helps us find issues with data in the response and of course allows us to update the quickly when new parameters / flags are added for a specific request.


The use of grovvy scripting as well as using the embedded assertions help us ensure solid pass / fail scenarios.

Occasional Contributor

Our data is very specific in how it's created and what our systems will accept, so we not only use SQL connections to inject data through queries, but we also use multiple methods in one test case to further inject values we wouldn't have until other methods were ran. This allows us to automate a lot of our API regression suites. 


One of our major issues we found through data testing is white space. At the beginning, a lot of our API testing was failing because our data would have white space either before or after the value. Our data comes from a lot of sources and our ETL processes were pure in the data it retrieved, so no trimming was added to our script logic. Since our method fields do not trim white space from their field values, we were constantly breaking when trying to implement data driven testing. After this was figured out, we incorporated trimming into our database values to remove this issue and to save storage space in our database. 

Frequent Contributor

Apart from actual testing API, I find most value for Data driven test suites when I generate data for testing. So I use API to load data for country, city, zips etc to generate standard performance testing set up. Also it gives me quick and easy way to dump the performance stats data to a notepad with delimiters which I can easily feed to the graphs to analyze data. DDT using notepad and groovy makes it great set of tools that works very well together without much hassle.

Community Hero

I would like to share my views on Data-Driven Tests.

Applications of Data-Driven testing

As pointed by the original poster this thread, there are different possible ways to adopt the Data-Driven testing approach. Apart from those, here are some more notable use cases.

Use case #1: Login Validation Tests

  • Different users with credentials.
  • To test the validation rules for login id / username.
  • To test the validation rules for password.


Use case #2: Role to Function Tests
There are many applications, such as Banking, Insurance etc., even including our Community forum with different privileges such as admin users, non-admin users, moderators, editors, customers etc.


It is required to validate the application whether it is behaving correctly or not for each type of users.

This testing is applicable not only for web applications, but also for API tests.

Use Case #3: Security Tests

It is also another very good use case for the data driven tests. If you take a look at the Security test data (can viewed in the tool itself), you will easily come to know. Basically, try with different data until it breaks into the system either attack or expose certain details about the application though one could not break in some cases.

How does it work?
As most you know well, Data-Driven testing approach works very well if you want to cover large number of data sets for particular API or UI tests.

It is easy to create a single test case and cover many cases for varied data.

You may add as many number of data records with different positive and negative patterns in order to validate the customer requirements.

How the data source is generated?
Though it is possible to automatically create data for Data-Driven tests with certain patterns for different type (Security tests is good example for this), mostly data is created manually, I believe.


There can be different reasons for that such as application specific data, custom data type, some specific enumerations for certain fields etc.,


Many times, if organized well, data can be created easily. But, it may quickly turn in to larger data / records and people may end having duplicate tests while losing/missing some actual critical tests.

What Type of data source?
May be this is subjective based on the usage, simplicity and user understanding.
As we know, may of the folks in the community uses Excel as the data source because of the simplicity to add the data. Some make it complex by adding multiple sheets.

Personally prefer csv format for data source for data driven tests irrespective of API or web apps.

But, ReadyAPI has very rich set of data source supported formats and I know many you exploring other types. This reduces the burden on the users in case if they get the source data by their customers / clients and avoid transformation for the Data-Driven tests.


Things to avoid.

Here, I would like to mention that, based on the community interactions, some users want to show each record as a test case in their report.

And I strongly have a feeling that they did not lack idea of having DDT. It is misconception has each record in the data should turn into multiple-test cases in the report, not sure what is benefit out of it showing numbers than focusing on the more coverage.

Wanted to emphasize here is that, only single test is designed the project to execute multiple records of data / patterns. So, it should be completely fine to show as Single Test Case in the test report.

It must also needs to be noted that when the test is failed for the specific pattern, user need to run the tests for the entire data again to make sure all the tests for the data is working fine in case after the fix is applied.


Multiple Datasources
There is one more thing which I must mention in this regard. People adding multiple data sources in a single test case which never it get.


May be haven't have handled such scenarios / use cases or couldn't spend time to understand it.

But felt that it is too complicated to design such cases until and unless there is good case for it.


Saving Responses to Datasource
Another one here is that, some people want to store result back into to their data source or as part of DataSink. While this is OK as long as it make sense. But some times, putting response payload back in the data source. I belive that it makes huge document in doing so. Many times, I suggest users to store the response payload in a separate file and put the response file reference in Data file rather than payload itself which improves the readability of the data file.

Some General Problems & Remedies

Not sure, If I can tag it as issues for the following.

#1 : Different assertion value for each record
Ok, we have designed the patterns for the positive and negative tests which can be used in DDT. Now, how do you verify if the test is passed or not?

Let us take an example data file with following records, col1 is input data and col2 represents the expected value in response.

col1, col2
record1, 200
record2, 302
record3, 404

The question now is that how do user different assertions for the above?
Because, obviously can't add 3 assertions with above expected code. That was just a sample, the cases can be of numbers.

Couple of approaches here:
1. If it is content related, then it is possible to Contains Assertion. But, use Property Expansion instead of fixed value. However, it is difficult if the expected value is in response headers.

2. Use Script Assertion. Of course, requires scripting skills which of the community have and Of course, will improve over the time.

Another example in the SOAP testing for the above data is expecting Soap Fault for a record, and Not a Soap Fault for another record.

#2 : Comparing payloads
Here is another case. What if user like to compare the total response?
Many times, user keeps the expected response also in the data file like it is already mentioned above. This is not the main point here.
Point is some of the payload contains list of elements (some times complex) with different sort order.


This can be either SOAP, REST, or JDBC step.

Need to take Script Assertion. Got couple of examples if you are interested to take a look
Of course, this is not really specific to DDT, but in general as well.

#3 : Comparing numbers by rounding
Some expected data is a floating point number with 2 decimals where as actual data has more decimals and need to round the numbers to 2 decimals so that comparison is possible.

Script Assertion rescues here.


I think, we can go with this subject at length. But concluding here. Please feel free to express your views.


Thank you for your time reading this. Hope this is useful.

Community Hero

How would I deal with unique data value with Data-Driven tests?

I thought it is important to talk about this type. Hence, adding it.

Usually, data source can have many columns with different data types.

At times, it is quite common case that there is some fields / elements in the request payload needs to have unique values such id / date time / uuid etc.,

In general, data source is edited before running the test to have that data ( may be manually edited ) for those specific columns in the new execution.

It is suggested that not to keep such field / column in the data in the data-source file. Instead create that required data automatically, probably with the help of Groovy power, and use that data in the actual tests.

This way, you do not have to modify the test data each time before the run and unwantedly deal with SCM and endup in check-ins.

New Contributor

I use DDT for all my Tests, managing data in the Text and Excel files. We run the projects through TestRunner.bat which is fast and easy Smiley Happy but the trouble is with the error logging.


I use groovy script in Tear Down part to capture all the messages, this comes handy when dealing with huge data file and will be easy to identify the errors/issues.


Hope this link will help you as well: https://community.smartbear.com/t5/SoapUI-Pro/logs-are-not-getting-generated-correctly-when-use-tear...


However there are limitations with this as well and the SmartBear Team is actively providing support in that area.

Community Manager

Hi everyone!


Thanks for your valuable feedback. The Product Team was very interested in learning it.

And, I'm happy to announce the winner of Day 1 - and the gift goes to .... @nmrao! My congrats! You gave Rao's amazing comment 6 Kudos. Join me in congratulating him!

Rao, I'll contact you soon.


And the article for Day 2 has already been published. Today, we suggest talking about Testing Frameworks:



Occasional Contributor
  • Do you use data-driven API testing in your organization? If you haven't used it yet, have you evaluated this approach?

Not Yet, but planning to do so. I would like to keep the excel sheet inside the project so that whenever we give it to any one, we need not to pass excel sheet separately or kept it at a location.

As due to above reason people try to avoid data driven via excel as you have to pass it to all testers


  • How do you generate data for your tests? Which type of data do you use?

As of now we plan to put data manually. Its the sample data not like production data, gradually everyone should move to production like data. Because sometimes your project is going for the first time and you dont have access to such data


  • What issues did you find in your tested API thanks to data-driven testing?

I saw earlier in one of the API testing where we found for some of the inputs instead of throwing an exception system was throwing code to the user which was not required. This kind of things can only be found out if you test extensive data 


I test almost exclusively using Data Sources in my test. I develop each and every row one my one. Some test cases had had as many as 150 properties in the test cases both inputs and expected output results.


In some cases I acquire the data from an internal source, and it is specifically stated in the test plan the origin and ownership of the data. Whether it is considered gold data and not to be modified in any way, or if there are any properties that must be reformatted in order to be submitted through the API or to convert numeric data to match the precision, or Date formatting. In these instances, groovy scripting is an absolute necessity. 


It is vital that I understand the required inputs and the values and ranges associated with them. I tend to develop binary state table, which is nothing more that a decision tree containing all of the inputs. this will give me an exhaustive set of testing combinations, and from that point I can eliminate any unnecessary data-sets, either based upon requirements or conversations traceable to a developer or project owner.


The same practice is used in all forms of testing where state or data transitions occur. Here is a great reference that all QA engineers should have in there desk. A great resource for identifying your test data.


How to Break Software: A Practical Guide to Testing
Book by James A. Whittaker




Users online (90)
New Here?
Join us and watch the welcome video:
Join the September Hub-bub to show off, learn and win