Hacker News new | ask | show | jobs
by throwaway4aday 644 days ago
While I agree that we don't want to extrapolate too much I disagree that this type of exploration may not benefit from more compute. We won't know until we try and since we have what seems to be a very generalizable architecture it makes sense to take the brute force approach of creating models of that data by scaling the amount of data and the amount of compute we dedicate to it. If it turns out not to work then we've learned something. As it turns out, Logic and Algorithms has seen some early success using Transformers (Searchformer) https://arxiv.org/abs/2402.14083