Hacker News new | ask | show | jobs
by skp1995 582 days ago
ohh inserting.. I tried it on couple of big repos and it was a bit of a miss to me. How large are the codebases on which you work? I want to get a sense check on where the behavior detoriates with embedding + gpt3.5 based reranker search (not sure if they are doing more now!)
1 comments

Largest repo I used with Cursor was about 600,000 lines long
that's a good metric to aim for... creating a full local index for 600k lines is pretty expensive but there are a bunch of huristics which can take us pretty far

- looking at git commits - making use of recently accesses files - keyword search

If I set these constraints and allow for maybe around 2 LLM round trips we can get pretty far in terms of performance.