|
|
|
|
|
by hande-k
332 days ago
|
|
Really appreciate you sharing this. What I am trying to use is gpt o3, so would be curious to see it in the benchmarks. Still seeing the raw traces tells me the tooling is starting to cross the “actually usable” line and makes me want to try on my examples this weekend. Looking forward to the MCP benchmark as well. |
|