Test automation is still a critical pain point for software development teams due to the high maintenance costs of the test scripts and their loss of business relevance over time.
This story presents an approach and a tool-based process that leverage operation logs to measure and complete automated regression test coverage vis-a-vis actual usage. The goal is to address two significant problems of regression test automation:
● Ensure the relevance of automated regression test suites concerning the actual product usage;
● Reduce the production effort of these test suites.
Test relevance, i.e., their ability to mitigate risks in production, is a crucial value of QA activities. This is even more true for automated regression tests, whose creation and maintenance require significant effort and investment over time.
The Gravity process is implemented in three steps (see Fig. 1): (1) Analysis and visualization of production usage traces and test traces; (2) Comparison of these traces and identification of tests required to complete coverage, and (3) Generation of automated test scripts.
Step 1 — Extraction and clustering of traces from logs
A bit of vocabulary is helpful to clarify the subject:
● A log file or database is a set of execution events, usually recorded in a database (such as ElasticSearch) or managed with a monitoring tool (such as Splunk, Dynatrace, Datadog, or App Insight).
● A log event is a basic item in a log file. It is a logged event with a timestamp and a set of characteristics. For our example with the Spree API, Figure 2a shows an extract of these log events corresponding to API calls in REST format.
● A trace is a sequence of log events corresponding to a usage session. Our example is a sequence of Spree API calls, resulting from a logged execution. For example, a trace can be a customer journey on a Spee-supported online store.
In this first step, three processes are performed after configuring access to the monitoring tool:
(1) Acquisition of log events over a defined time range and possibly also using filters on log event types,
(2) Interpretation of log events into traces (illustrated in Figure 2.b). In figure 2b, each item of a different color represents a line in the log file. The logs are thus sorted out to re-create the traces (i.e., the sequence of events for a user session).
(3) Hierarchical clustering of traces to group them by business proximity and visualization as workflows (Figure 3).
The calculation of traces from log events is based on the chaining recognition between events. In most cases, this is done automatically using session identifiers. For more complex cases, the chaining configuration of log events is done manually.
After computing the traces, the hierarchical clustering organizes them according to their degree of proximity: two traces corresponding to close sequences of events will be positioned close to each other by the hierarchical clustering algorithm.
Finally, the visualization in a workflow is built automatically using a Model-Based Testing abstraction algorithm.
Step 2 — Analysis and visualization of usage coverage by tests
Traces re-creation from the production logs presented above can also be carried out from the test execution logs. Then test traces can be compared to the usage traces to establish the usage coverage by the tests.
Figure 4 shows the comparative visualization of usage traces created in operation and test traces obtained during test execution. Paths in purple color mark out usage traces covered by at least one test trace. The pink paths show usage traces that are not covered by any test trace.
Test traces can be derived from automated test execution (typically regression) and manual execution (e.g., exploratory test sessions).
In Figure 4, coverage is shown on the workflow; the coverage percentage is computed for groups of traces and displayed within the hierarchical clustering panel.
Step 3 — Generation of automated test scripts
In our example of testing the Spree API, tests are automated with Postman. Figure 5 illustrates the flowchart of test script generation:
- Usage traces are created from the operation environment and the test traces from the test environment;
- These traces are compared to bring out the traces not covered by any test;
- The tool user can select the uncovered traces that should be covered;
- The automatic generation of test scripts in the target automation environment is performed by associating the steps of the traces and the corresponding test actions with the support of the tool.
In our example, since the log events are REST API calls, and the automated tests correspond to sequences of API calls in the same format, the transition from log events to the test scripts is done by forging the REST requests for publication in Postman. The Swagger specification is used when available to facilitate this construction of API call requests, which is the case for our Spree API example.
At the end of step 3, test scripts that complete coverage were generated automatically. In our Spree example, more than 80% of the automation artifacts are automatically generated. Only some test data needs to be reworked.
Lessons learned from experience
The tooling process described above is implemented in Gravity, developed by Smartesting. Gravity is being tested in DevOps contexts for API and GUI-based testing, in digital factories of large accounts, by R&D teams of SaaS vendors and, in the context of eCommerce. Here are several observations that can be drawn at this stage:
● Logs were available in a format enabling their direct use in several experimental contexts without any edition. In all cases, these logs were managed in a dedicated tool, namely Dynatrace, Datadog, Splunk, Application Insight, and Elasticsearch.
● The calculation of usage traces, their clustering, and visualization in workflows facilitate a better knowledge of actual usage by users/customers within the Product team. In some cases, our clients discovered specific user journeys in the application that the Product team was completely unaware of.
● Measuring and visualizing coverage on the usage trace workflow appears to be very useful for teams to complete test coverage of key user journeys.
Functional regression testing on APIs is a good use case for Gravity which leads to improved test coverage of the usage combined with a significant reduction in test automation effort.
Experimenting with the overall approach “from logs to tests” allowed us to identify several concrete methodological aspects of implementation :
● The process is based on collaborative work between Ops engineers (who manage the logs) and Testers (who create the tests), aligned with the evolution of software development practices towards DevOps.
● At the beginning of our process, the acquisition of logs is carried out by targeting representative extracts of the usage (for example, a part of the working day or a specific type of users), according to the objectives of coverage of the usage the tests.
This logs-to-tests process addresses the needs of software teams to optimize their automated regression testing on key user journeys.
The test coverage analysis vis-a-vis usage traces is highlighted graphically and is updated as the product evolves.
The AI contribution happens during traces clustering to facilitate usage pattern analysis. We are experimenting with some additional directions that will gradually reinforce the tool’s functionalities, for example, for calculating test data by automatic learning of data equivalence classes and for the extraction of representative workflows to be tested on a given usage cluster.
If after reading this article you are interested in test automation regression, ask us to give a free trial here.