|
|
|
My approach to guessing a gender from a first name.
|
|
13 points
by Stromgren
4697 days ago
|
|
Hi!
A short time ago, i decided to try and build an API that would try to guess the gender of a first name. I thought this might be useful for segmenting user lists for campaigning, analytics or similar.
My first approach was to use a dataset of approved names from a few European countries. This was in the believe that most countries had lists like this (Which they don't) and i planned to add them as i went along. I got wiser and the first feedback i got also told me that the API should be able to do probabilistic guesses and if possible, also offer some sort of localization filter to achieve more accurate guesses.
I decided to take an approach of using large, growing datasets of user profiles from social networks. Each entry containing a first name, a gender, a country_id and language_id. At last, i exposed this datamodel through http://genderize.io
It responds in JSON. Simple example: http://api.genderize.io?name=robin
I am now looking to get some feedback on my new approach. What do you think of this way of doing guesses. What do you think of the API? Any feedback is welcome.
The API is completely free by the way. |
|
Obviously you need to run a test that uses a list of real people's names and genders to measure the method's accuracy. But remember the following points:
* People might resent any effort to pin down their gender in a commercial or advertising context.
* The negative outcome for a gender misidentification may be much greater than the positive outcome for a correct one.
* Gender-neutral names are becoming increasingly fashionable among well-educated parents, i.e. people who have money.
On that basis and in my opinion, unless you can get above 90% accuracy, it's not worth doing.
Some popular gender-neutral names:
http://www.babynames1000.com/gender-neutral/
http://thestir.cafemom.com/pregnancy/157282/25_best_genderne...
http://en.wikipedia.org/wiki/Unisex_name#English
A quote: "Unisex names have been enjoying a decent amount of popularity in English speaking countries in the past several decades."