Hacker News new | ask | show | jobs
by localhost 1430 days ago
I’m curious: what would need to be added to Excel to do this?
1 comments

My guess (some if this we already have, some we don't): - automation: integration of heuristics (multiple columns that you can program via formulas and such) - exploration: finding outliers or most similar records given some reference (e.g. "I want to label more rows that are about business news in some extent") - monitoring - labelmanagement [which we don't offer yet in the extent we'd like to]: merging and splitting labels etc.

generally anything that scales and "somewhat" guarantees the users to input valid labels.

But it definitely offers something that new tools don't: users are super familiar with it.

Do NLP users use Excel naturally already?
Annotation platforms use Excel. I once received 25 files of separate Excel spreadsheets from a labeling service for 10k texts (short texts about product titles, e.g. "Sauvignon blanc" -> "wine"). Had to merge them, which wasn't as easy as you'd might expect.

Also, I once labeled 5,000 texts during my master's degree via Excel. Was painful as hell.

Thanks for the insight about annotation platforms!

How might Excel have been better for you in these tasks? Or put another way, in the first case of merging 25 files - did you wind up using a different tool to merge and then re-opening them in Excel? Was Excel limiting because you needed to do some kind of fuzzy matching against the labels, e.g., wine, Wine, white wine etc. to do the merge?

On the labeling task that you had - what might have made that easier for you? Some kind of custom scripting that's above and beyond what you can do in VBA?