|
|
|
|
|
by nooumenon
588 days ago
|
|
Hi! Very cool results. Are you able to share some numbers about the slope of the scaling curve you found, i.e. how performance responds to a growing nr of demonstrations? Academically I'd also be very interested how much of a data efficiency improvement you achieved with the pretrained model + task specific post-training versus from-scratch task specific training - like, if post training requires say 50 additional demos, and from-scratch on smaller model requires say 250 demos (or whatever) to match performance, that would be an interesting quntification of the efficiency benefit of using the big foundation model |
|