“Not all those who wander are lost”
– J.R.R. Tolkien, The Fellowship of the Ring
A few months ago, if you visited our homepage, you would have read the following tagline: “From logs to tests – An innovative web app that helps you create functional tests based on your user’s application flows, assisted by AI” The (former) Gravity home page
Very descriptive, yet accurate with what our platform was back then. Gravity was expecting logs from a production and a testing environment in order to figure out by itself the main user activity patterns in the application under test, evaluate the coverage of these patterns against a test suite, and generate automated test scripts.
This is basically the concept we developed and then showcased to a bunch of software development and QA teams. Some of them even tried using it and got some encouraging results.
Except it didn’t work out that well.
Framing the problem
Let’s mitigate the dramatic effect of the last sentence. A few things actually did work, starting with the validation of the problem we were trying to solve with our product: scoping high-level test suites to only what’s necessary without compromising on quality.
In the era of Agile and DevOps practices, software development teams are able to continuously deliver value to their customers, up to multiple times a day. And they are doing it while keeping the quality checking process as targeted and short as possible.
New test automation tools and processes (frameworks, continuous integration systems, reporting…) are released on a daily basis to help these teams improve their process and perform even faster. Even though, the checking process is still considered a bottleneck, particularly when it comes to functional, high-level tests, which are known to be extremely expensive to maintain.
We discussed with a lot of people involved in designing quality assurance processes and we discovered common challenges and requirements.
When you design tests during the development of a new feature, you will rely on assumptions about how your users will interact with this new feature (their path, the data they will enter, the default choice they will make). But users are human beings and human beings are keen on chaotic behavior. So, they will never ever take the nice nominal path you just designed and wrote an automated script for. And so, they may encounter issues that you did not anticipate.
Over-testing (the product manager pet feature)
When considering your test suite, you might have the hunch that a big part of your high-level (end-to-end) tests are over-checking features that are not important for your users and you are losing a lot of time executing and maintaining them.
The imitation game
When an issue occurs in production, it is always really complicated to reproduce the path that led to it. It can become a challenging investigation process in which will be involved:
- Ops who can access technical metrics and errors monitored in observability tools
- Product people who have the knowledge of how the features are supposed to work
- Developers who will be able to understand cryptic errors and target the piece of code responsible for the issue
(and of course, “It does work well on my machine” is not an acceptable answer)
“From logs to tests” – a first step to “Usage-Driven Testing”
These problems are the reasons which motivated us to start building a “Usage-Driven Testing” platform. We think that learning from how an application is actually used is key to making our test suite as focused as possible. We think behavioral data can be leveraged to increase the quality of our software and accelerate the delivery:
- By discovering the actual paths our users take, we can update existing functional tests to fit the reality and reduce the risk of regressions on key features
- Let’s compare the sequences from tests with user paths. We can evaluate which tests are not “relevant” to users’ behavior and redesign them (move them to a lower level of implementation for instance) or even remove them. Therefore, we can gain time when it comes to executing the test suite and reducing the feedback loop
- Reproduce more easily paths that led users to regressions
We start building Gravity by using technical logs as input data. At first, this solution had many advantages:
- No need to add tracking code (unlike product analytics tools)
- Most teams rely on logs to measure performances/track issues so there is already plenty of available data
- Good fit for API testing (same technical level)
We built the first Minimal Viable Product of the platform. It allowed users to import logs from many providers (Datadog, ElasticSearch, Dynatrace…), digest them into business-readable user sessions (if the logs were holding “business-readable” data), and create functional tests from them (Postman and Cucumber).
We demoed it and got marks of interest in the concept. And we discussed with a lot of QA people and found out that some of them were already trying to implement “Usage-Driven testing” their own way (by building homemade “testing-fitted” analytics frameworks for example). We even had a couple of early adopters who managed to leverage their logs to generate tests.
But, just at the end of last spring, after a year of work, we decided to pause and take a step back. Why?
Because we learned we wouldn’t be able to release a commercial product with this solution.
Limitation of technical logs
Logs are technical, they :
- are made for engineers to help them monitor their applications, measure their performances, and track issues.
- usually don’t contain any business information that could help reproduce easily the course of a user session.
- can’t be modified easily to add that information and their non-standard nature makes them hard to be used by an external tool like ours.
- And for obvious security reasons, companies are generally reluctant to let them get out of their servers.
When an organization wanted to try Gravity, we had to spend a significant time auditing the logs of the application they wanted to test and wait for the technical teams to integrate our feedback. Most of the time, we didn’t get there and just gave up. It was a major usability issue (people could not start using the product despite being interested in the promise) and a viability issue (we could not afford to dedicate a team just to bootstrap new projects).
The problem we were trying to tackle seemed to be the right one. The “Usage-Driven Testing” approach resonated with the people we discussed with. But the solution was not.
“Usage-Driven testing” – Season 2
“Obsess about the problem, not the solution”
Richard Banfield, Product Leadership
We gathered the whole team and brainstormed. In what ways might we feed Gravity with usage data from which we could create functional tests?
We showcased this prototype and, in a week, we hit the “5 teams want to install Gravity to test their application” goal we set in order to decide if we should go further in this direction.
Which we did. With the whole team, and armed with the early feedback we got from our discussions with the QA engineers, developers, and project managers who saw our prototype, we defined a new MVP and timeboxed its execution to a quarter. We wanted to have our first users actually play with a production-ready application in mid-September.
And now we are.
Gravity’s first release scope
With this first release, we wanted to build a minimal set of features that would enable Agile/DevOps teams to design the most targeted high-level test suite possible, with the help of usage data.
We decided to make it open-source, so it can be easily reviewed by engineering teams.
The repository is on GitHub: https://github.com/Smartesting/gravity-data-collector
Feel free to contribute!
Explore user sessions
Once a few sessions have been recorded, you will need to browse them and find the one that could be interesting for us.
In order to do that, we added some basic filtering capabilities, so you can search sessions that contain a sequence of pages: