| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dmitry_dygalo 887 days ago

This reminds me of several solutions albeit lacking the explicit "AI" part:

- Up9 observes traffic and then generates test cases (as Python code) & mocks

- Dredd is built with JavaScript, runs explicit examples from the Open API spec as tests + generates some parts with faker-js

- EvoMaster generates test cases as Java code based on the spec. However, it is a greybox fuzzer, so it uses code coverage and dynamic feedback to reach deeper into the source code

There are many more examples such as Microsoft's REST-ler, and so on.

Additionally, many tools exist that can analyze real traffic and use this data in testing (e.g. Levo.ai, API Clarity, optic). Some even use eBPF for this purpose.

Given all these tools, I am skeptical. Generating data for API requests does not seem to me to be that difficult. Many of them, already combine traffic analysis & test case generation into a single workflow.

For me, the key factors are the effectiveness of the tests in achieving their intended goals and the effort required for setup and ongoing maintenance.

Many of the mentioned tools can be used as a single CLI command (not true for REST-ler though), and it is not immediately clear how much easier it would be to use your solution than e.g. a command like `st run <schema url/file>`. Surely, there will be a difference in effectiveness if both tools are fine-tuned, but I am interested in the baseline - what do I get if I use the defaults?

My primary area of interest is fuzzing, however, at first glance, I'm also skeptical about the efficacy of test generation without feedback. This method has been used in fuzzing since the early 2000s, and the distinction between greybox and blackbox fuzzers is immense, as shown by many research papers in this domain. Specifically in the time a fuzzer needs to discover a problem.

Sure, your solution aims at load testing, however, I believe it can benefit a lot from common techniques used by fuzzers / property-based testing tools. What is your view on that?

What strategies do you employ to minimize early rejections? That is, ensuring that the generated test cases are not just dropped by the app's validation layer.

1 comments

yevyevyev 887 days ago

Hi Dmitry, thanks for replying here. You raise some good points - I'll do my best to address them below. I'll also add that we have a fully-featured free tier that you can sign up for on our website (www.multiple.dev). Any hands-on feedback from our product or the TestGen feature would be extremely helpful.

Test feedback - during our TestGen flow, the user provides feedback on the sequence and contents of the API requests. And at the end of the flow, our users can manually edit the resulting JS code for additional customization.

Effort to create a load test - You can go from a Swagger or HAR file to a function load test, written in JS, in a few minutes. There is no learning curve, assuming you have basic knowledge of JavaScript. Maintenance is typically minimal.

CLI - we are launching our CLI shortly, where users can start tests from command line as you describe. It'll work similarly to Jest or other unit test frameworks, where the test scripts will live in our user's codebase.

The use of AI - we use AI to generate realistic-looking synthetic data, which can be challenging with strings. The AI matches each field to the most relevant faker-js function. We need the content of the string to look like something the target application would receive in production. And with HAR files, we use AI to help filter out irrelevant requests such as analytics.

I hope that was helpful, and I'm happy to go into more detail.

dmitry_dygalo 881 days ago

Hi!

> Test feedback - during our TestGen flow, the user provides feedback on the sequence and contents of the API requests.

So, it is not fully automated, the user needs to provide the feedback, or is it optional?

Originally by feedback, I meant if there is a feedback loop between the system and the test harness, so the test harness can learn from the system behavior and produce better data / spend less time on ineffective cases. This also is essential for things like test case reduction when a failure happens.

> There is no learning curve, assuming you have basic knowledge of JavaScript. Maintenance is typically minimal.

I'd be cautious about saying that there is no learning curve. Based on the docs at https://docs.multiple.dev/how-it-works/ai-test-gen I see that one who uses the feature should also be aware of your environment API, e.g. `ctx`, `axios`, etc. That does not match my expectations when read about no learning curve and basic JS knowledge. It is not far from there though.

> CLI - we are launching our CLI shortly, where users can start tests from command line as you describe. It'll work similarly to Jest or other unit test frameworks, where the test scripts will live in our user's codebase.

Cool! So, the user needs to commit the test code to their codebase, right?

> The use of AI - we use AI to generate realistic-looking synthetic data, which can be challenging with strings. The AI matches each field to the most relevant faker-js function. We need the content of the string to look like something the target application would receive in production. And with HAR files, we use AI to help filter out irrelevant requests such as analytics.

Yep, thanks for the clarification. I am thinking about how effective such realistic-looking synthetic data is in uncovering defects, i.e. if it covers happy-path with such data, then it left me wondering what about uncommon scenarios? Specifically, if it still can cover uncommon characters (from various Unicode categories)

Overall, I'd say that I like the idea and what I've read in the docs :) Good luck!