|
|
|
|
|
by lumax15
1189 days ago
|
|
We don't have any rigorous benchmarks against Copilot, but we're working on building an evaluation framework to do so. We've played a bunch with traditional academic metrics for codegen (e.g. pass @ k) and found they don't correlate super well with real-world performance. Also, want to mention that we are not competing directly with Copilot. A benchmark against Copilot is useful as we further improve our product, but our main value add here is not that we perform better than Copilot, but rather that we serve a customer segment that can't use Copilot. Would love to hear any thoughts you have. For training, we start with a capable open-source base model, augment it with a bunch of permissively-licensed repos, and then fine-tune on the customer codebase. We currently support C/C++, Go, Gosu, Java, Javascript, Python, Ruby, and Typescript, but we're continuously adding new languages. |
|
Does it make useful code? Does it make the same code?
Or more strictly on something like latency and cost?