Hacker News new | ask | show | jobs
by kzakka 1175 days ago
Hi, one of the authors here :)

The demo you are watching is an agent trained from scratch with reinforcement learning. It has roughly 6 days of experience (10M steps at 20 Hz). The Javascript demo is replaying the policy open loop which is why it's not super robust to disturbances.

Re:fingering: we actually use fingering information to create a dense reward for the agent (otherwise it makes exploration super hard). It would be an exciting future direction to have the agent discover and optimize for fingering that best suits its kinematics :) And beyond that, having RL inform pianists about the difficulty of a piece or even more optimal fingering would be amazing.

We trained a bunch of these policies on roughly 150 songs (baroque, romantic, classical) and we did some analysis in the paper if you're interested: https://kzakka.com/robopianist/robopianist.pdf

6 comments

This is really cool!

There are two motions in particular that pianists use constantly that don't seem to be represented in the robot model, if you're looking to get closer to the way that human limbs and digits operate. (Naturally there are plenty of other goals, but if you can imitate human playing you can do things like suggest fingerings or assess difficulty, as you say.)

1) turning at the elbow (so that your forearm can make an angle with the piano keyboard instead of always being perpendicular to it). It looks like you translate the forearm back and forth instead, which I assume must be a lot easier to handle because of course it's not how human arms work.

2) rotating the forearm/wrist (like turning a doorknob). Pianists do this on basically every note to a greater or lesser extent. To take an extreme example, if you alternate notes with your thumb and pinky you are almost completely using your wrist and not your fingers. Without this degree of freedom it is not really possible to emulate a competent pianist, if that is one of the eventual goals.

Thanks! We did indeed explore these additional degrees of freedom, you can find vestigial code for this here: https://github.com/google-research/robopianist/blob/main/rob...

We ended up picking a minimal subset of forearm DoFs that wouldn't impact training speed too much.

This is insanely impressive. For fingering, in the right hand I typically put my pinky on the highest note for a phrase, it feels more comfortable and you can accent it more than the middle fingers. In the left hand I typically put the bass note in the pinky as well. The middle fingers aren't as dextrous so I use them less, though a concert pianist could probably use your fingering. Overall technique wise, human hands cup their hands more, the palm is arched where the robot's is flat. But who says it needs to model humans exactly. I can't believe this is working in three.js! Amazing work!
Here's some fingering for Turkish march https://musescore.com/user/73797/scores/142975
Concert pianists traditionally avoid using the pinkies because they're weak fingers.

Horowitz famously leaves his pinky curled most of the time: https://youtu.be/9LqdfjZYEVE

Watch closely how Gould will press a key with his ring finger and then switch to the pinky to hold it: https://youtu.be/p4yAB37wG5s

Nice project! Anyway, one of the unrealistic details is that the robot in the simulation curls the fingers when it is not using them. In particular the pinky finger. Can that be fixed in a future version? For comparison, I got this as the first result in Google https://www.youtube.com/watch?v=cGYyOY4XaFs

It's also strange that all fingers are always parallel, but I guess that adding that freedom makes the search space huge.

I don't think the intention of this simulation is to be realistic. This particular agent just learned to play the music it was reinforced to learn given the physics constraints programmed for the hand mechanics (as far as I understand it). I doubt the physics emulate our human hands very accurately so I wouldn't expect it to be "realistic" or something that needs to be "fixed" unless the specific intention was to optimize actual human hand movements.
Yup, we're not trying to mimic human movements exactly but rather optimizing for the reward given the robot hardware. Fun fact, we do things like add an energy penalty to try and reduce jitteriness / un-human like movements and it does help enormously.
I understand that the research objective is not a human like movement, but I think changing the rewards to keep the fingers straight will get nicer videos to show, and I don't expect it to be too hard.

Another question: The pinky finger is not shorter than the other fingers. Can it be a problem for the robot to use the human fingering?

Fingering is harder than it seems, especially once you start to take into account speed, fingerings that work when playing slow may not necessarily work when playing fast. And individuals have different hand spans so a fingering that works for one person may not work for another.

If you crack this in a deterministic way it would be super useful as a library.

Well when you put it like that, it truly does sound like an absolutely delicious problem to tackle.
Yes! Thank you. This paper is exactly what I needed.
Wow.