| >We use the public GREYC keystroke benchmark database Yes. That's their own database which they're talking up, the one that they made to do this research. That's what I was talking about. >In order to reduce the bias due to this high quantity of male information, we only kept the first n male samples( where n is the number of female samples). It happens that I didn't read this part. On reflection, what I understand now is far worse than what I originally understood: - They have 35 females and 98 males, they take many handwriting samples from each. - Since the participants provided many samples, these samples appear both in the training set data and in the test set data. - I use the training set data to figure out if I can recognise the handwriting of the 35 female participants. - Then I look through the test data to see if I can identify those participants again. Basically what you've shown is you can identify the handwriting of 35 people if you've already seen it - 88% of the time. Splitting groups into 'female' and 'male' is a red herring. This method would presumably work, even if I split them into two random groups. If I'm right, this is not even state-of-the-art. In 2006 they could have been scoring 96%: http://abcnews.go.com/Technology/story?id=97978&page=2 |