Hacker News new | ask | show | jobs
by janalsncm 736 days ago
Agreed, I looked through their “paper” and while it goes through the motions of a scientific paper, there’s barely any reproducible methodology. A single page in their paper, including the diagram.

They do reference some papers I’m not familiar with and say their method is “similar”.

If you check the huggingface page mentioned in a footnote, they have two directories: one for a model, and the other which contains a FAISS index. Although in the paper they say they use cross attention, so I have no idea how those could be combined.

1 comments

That’s fair - I’ll try to go through the weekend and write out some of the equations for the kernel that loads the weights out of the index and does the adaptor ops. It’s inspired by cross attention in retro but there are some differences for training stability and to use as an adaptor rather than training from scratch.

I consider that paper an early draft - hot off the press so to say - it needs review & editing before we would submit it to a conference. I tend to prefer a few rounds of open review before a final submission these days anyways - so appreciate the feedback

I think the main idea should be reproducible - you can repeat the randomization and generalization tests with any LLM and get similar training curves and eval results - it just wouldn’t be efficient.

We have tried it on about 5 real customer use cases with different facts and good success. Obviously we can’t publish customer data to reproduce which is why we focused on the randomization tests in the paper .

There are also some missing hyper parameters from the appendix as well we will add eventually