|
|
|
|
|
by ftxbro
1132 days ago
|
|
> I was specifically referring to the delta column. I might be misunderstanding it - can you explain the huge flux in the confidence intervals there? It's essentially the same reason. Some delta confidence intervals are wide like 'PubMedQA' for which only 6 test questions had overlap (as they define it) with the training data. The small sample size 6 made that interval wide. Some delta confidence intervals are much smaller like 'MedMCQA' which had 893 questions with overlap out of 4183 total questions. The large sample sizes for both classes (with overlap and without overlap) made that interval much more narrow. |
|