| > 1. what is the type of service instrumentation needed to capture the data? Wonder why this is needed when typically the data is already captured in an APM log? The instrumentation might add performance and security concerns. Implementation is very similar to an APM log. So the same performance and security concerns apply. We are working on giving both at the same time (Automated tests, and APM), to reduce overhead. > 2. what is the sampling logic to capture the traffic? It might compromise the fidelity of the test data and give a false sense of test accuracy. It is random sampling. I feel, 1M or 10M randomly sampled requests should cover all cases. > 3. what is the duration of data capture? Is it a week's or month's or quarterly data? Meeting 90% coverage on a week's production sample data will provide a false metric. I was thinking 1 week should be enough. Maybe we will have to add some custom sampling logic for lesser frequency calls (like monthly crons). > 4. can it faithfully handle data privacy and customer anonymization? This is critical for API's dealing with PCI and other sensitive data. Yes. Additionally, for compliance, we offer a self-hosted solution- Our code runs on your servers and no data ever leaves your cloud / on-prem. |
> It is random sampling. I feel, 1M or 10M randomly sampled requests should cover all cases.
1. I suggest providing alternate approaches to sampling: The input itself may have bias towards a single use case. If 70% of the input exercises the same code path, there's no benefit to having a uniform sample. Ideally it would be stratified amongst customers, or perhaps on other dimensions to allow for covering the most surface area.
2. Requests don't happen in a vacuum. They likely have data dependencies on prior requests. I recommend some way of sampling sessions rather than individual requests. Replaying the 3rd request in a series of 6 is likely just going to be exercising failure paths.
3. Behaviors may vary between requests with respect to time. If requests were sampled over a number of days but replayed within a short time period, there are behaviors that could differ from what actually occurs in production.
I didn't see any explanation on how results are determined. I think it's important to surface those types of details on the website. I'm not going to watch the video on it in hopes of learning.