* Multi-hop reasoning rarely works with real data in my case. * Impossible to define advanced metrics over the whole dataset. * No async support