|
|
|
|
|
by refibrillator
656 days ago
|
|
Hmm there may be a bug in the authors’ python script that searches google scholar for the phrases "as of my last knowledge update" or "I don't have access to real-time data". You can see the code in appendix B. The bug happens if the ‘bib’ key doesn’t exist in the api response. That leads to the urls array having more rows than the paper_data array. So the columns could become mismatched in the final data frame. It seems they made a third array called flag which could be used to detect and remove the bad results, but it’s not used any where in the posted code. Not clear to me how this would affect their analysis, it does seem like something they would catch when manually reviewing the papers. But perhaps the bibliographic data wasn’t reviewed and only used to calculate the summary stats etc. |
|