| HN Mirror

During training, it's more efficient than full finetuning because you only update a fraction of the parameters via backprop. During inference, it can ...

1) ... be theoretically a tad slower if you add the LoRA values dynamically during the forward pass (however, this is also an advantage if you want to keep a separate small weight set per customer, for example; you run only one large base model and can apply the different LoRA weights per customer on the fly)

2) ... have the exact same performance as the base model if you merge the LoRA weights back with the base model.