That's one of the dirty little secrets with end to end tests that almost no one talks about. You will probably spend more time running after ghosts in the machine than finding actual bugs.
I've had that experience too. But also the opposite.
I've worked both on code where writing the tests was more effort than the code, and on code where writing the tests was easy, quick and helpful. The latter makes sense, after all a good test is straightline code, zero ifs, zero loops. But the former?
I think the key is that mocking should be used sparingly, but without hesitation.
>System level integration tests also tend to be more flaky.
That's usually a sign that they've been engineered poorly or you have bugs in your code.
System level integration tests need appropriate environmental isolation and solid asynchronous & multithreaded code. Nobody can be bothered to write these properly for tests, hence the flakiness ("ooh let's just insert a sleep here" / "eh, does it really matter which version of postgres we run?").