Hacker News new | ask | show | jobs
by Waterluvian 1176 days ago
I'm so excited and curious about this that I can't even structure my thoughts well. They're just flooding my brain.

What part of this is pre-coded? What part is being generated? Is the goal to give a program some sheet music (maybe a MIDI file) and it figures out the fingerings[1] and then translates the fingerings into kinematics?

Because if that's the goal... Holy forking shirtballs that would be amazing. One of the trickiest things for me as a novice pianist is figuring out the fingerings to a piece. It's like a puzzle you work at until you've figured out what's comfortable. It's all about lookahead. "This section generally goes down so I probably want to begin with my pinky and not my thumb."

And if it got really good at that, not only are the fingerings useful, but maybe we could get feedback on how physically demanding a piece is. Another challenge I've discovered as a novice is that it can be surprisingly tricky to look at sheet music or hear a piece and determine if it's as easy as it sounds. Some pieces require some very complex fingering.

[1] what pianists call the determination of what fingers go where, not just to play certain notes together, but to ensure you can fluidly and comfortably play the next notes as well.

4 comments

Hi, one of the authors here :)

The demo you are watching is an agent trained from scratch with reinforcement learning. It has roughly 6 days of experience (10M steps at 20 Hz). The Javascript demo is replaying the policy open loop which is why it's not super robust to disturbances.

Re:fingering: we actually use fingering information to create a dense reward for the agent (otherwise it makes exploration super hard). It would be an exciting future direction to have the agent discover and optimize for fingering that best suits its kinematics :) And beyond that, having RL inform pianists about the difficulty of a piece or even more optimal fingering would be amazing.

We trained a bunch of these policies on roughly 150 songs (baroque, romantic, classical) and we did some analysis in the paper if you're interested: https://kzakka.com/robopianist/robopianist.pdf

This is really cool!

There are two motions in particular that pianists use constantly that don't seem to be represented in the robot model, if you're looking to get closer to the way that human limbs and digits operate. (Naturally there are plenty of other goals, but if you can imitate human playing you can do things like suggest fingerings or assess difficulty, as you say.)

1) turning at the elbow (so that your forearm can make an angle with the piano keyboard instead of always being perpendicular to it). It looks like you translate the forearm back and forth instead, which I assume must be a lot easier to handle because of course it's not how human arms work.

2) rotating the forearm/wrist (like turning a doorknob). Pianists do this on basically every note to a greater or lesser extent. To take an extreme example, if you alternate notes with your thumb and pinky you are almost completely using your wrist and not your fingers. Without this degree of freedom it is not really possible to emulate a competent pianist, if that is one of the eventual goals.

Thanks! We did indeed explore these additional degrees of freedom, you can find vestigial code for this here: https://github.com/google-research/robopianist/blob/main/rob...

We ended up picking a minimal subset of forearm DoFs that wouldn't impact training speed too much.

This is insanely impressive. For fingering, in the right hand I typically put my pinky on the highest note for a phrase, it feels more comfortable and you can accent it more than the middle fingers. In the left hand I typically put the bass note in the pinky as well. The middle fingers aren't as dextrous so I use them less, though a concert pianist could probably use your fingering. Overall technique wise, human hands cup their hands more, the palm is arched where the robot's is flat. But who says it needs to model humans exactly. I can't believe this is working in three.js! Amazing work!
Here's some fingering for Turkish march https://musescore.com/user/73797/scores/142975
Concert pianists traditionally avoid using the pinkies because they're weak fingers.

Horowitz famously leaves his pinky curled most of the time: https://youtu.be/9LqdfjZYEVE

Watch closely how Gould will press a key with his ring finger and then switch to the pinky to hold it: https://youtu.be/p4yAB37wG5s

Nice project! Anyway, one of the unrealistic details is that the robot in the simulation curls the fingers when it is not using them. In particular the pinky finger. Can that be fixed in a future version? For comparison, I got this as the first result in Google https://www.youtube.com/watch?v=cGYyOY4XaFs

It's also strange that all fingers are always parallel, but I guess that adding that freedom makes the search space huge.

I don't think the intention of this simulation is to be realistic. This particular agent just learned to play the music it was reinforced to learn given the physics constraints programmed for the hand mechanics (as far as I understand it). I doubt the physics emulate our human hands very accurately so I wouldn't expect it to be "realistic" or something that needs to be "fixed" unless the specific intention was to optimize actual human hand movements.
Yup, we're not trying to mimic human movements exactly but rather optimizing for the reward given the robot hardware. Fun fact, we do things like add an energy penalty to try and reduce jitteriness / un-human like movements and it does help enormously.
I understand that the research objective is not a human like movement, but I think changing the rewards to keep the fingers straight will get nicer videos to show, and I don't expect it to be too hard.

Another question: The pinky finger is not shorter than the other fingers. Can it be a problem for the robot to use the human fingering?

Fingering is harder than it seems, especially once you start to take into account speed, fingerings that work when playing slow may not necessarily work when playing fast. And individuals have different hand spans so a fingering that works for one person may not work for another.

If you crack this in a deterministic way it would be super useful as a library.

Well when you put it like that, it truly does sound like an absolutely delicious problem to tackle.
Yes! Thank you. This paper is exactly what I needed.
Wow.
Piano fingering is a very subjective matter, because every hand, finger, arm and taste is different. I doubt that a robot fingering is of any use. You have to find it out for yourself under guidance of a teacher and maybe inspiration by fingerings of master pianists written in the score. The Henle app has a feature to show them separately. They differ vastly for the same piece.
What if you could upload a photo of your hand and the RL agent learns the optimal fingering for your hand?
My taste, my pianistict abilities, my arm, shoulder and body movements and my musical idea for the interpretation and imagination of the sound of the piece can not be seen in a photo of my hand. Not even a good teacher can give you a finished perfect fingering for you, I doubt that a dumb machine can do any better, especially if it plays that bad as the examples sound in comparison to a even beginner human. You have to find the fingering out for yourself and this is part of the great joy of playing / learning piano.
It's not quite that simple; everyone has different dexterity and innervation (how strong is the wiring between your pinky and ring finger?)
What if the AI gets to see you playing sample pieces or even if you have say a weighted keyboard that it gets to see the forces you can apply? I mean I've seen the Australian Sports Academy do all sorts of video and biomechanical instrumentation of elite athletes with the aim of improving their technique and provide a customised training regime. I can't see why it can't be used in music performance which in many ways is just as much athleticism as it is art.
Given sufficient information about your personal biomechanics, a sufficiently advanced AI may be able to suggest the right fingering for you. But the main problem with this (and I think this is something that many in this thread simply don't have the familiarity with) is that fingering is simply a 2D projection of the multiple-dimensional problem of how you need to move your entire upper body to play a piece.

I'll use a piece that I am practicing right now to illustrate: Chopin Etude Op 25 #1 ("Aeolian Harp").

Sheet music: https://imslp.org/wiki/Special:ReverseLookup/112921 (the Herrmann Scholtz one) Performance: https://www.youtube.com/watch?v=Ob0AQLp3a5s

For intermediate pianists (perhaps even beginners), the possible fingerings are actually really obvious just from looking at the notes, especially when using the suggested fingerings as a guide. This piece is structured around playing broken chords in circles, so there aren't really any fingering tricks here.

Notice the chords like the first right hand chord on the second bar on the second line, or the simpler left hand 4-note chords on the third line of the second page. Despite the obvious fingerings, somehow you need to figure out how play a broken chord that spans 15 keys, a distance that no one can comfortably cover by just stretching thumb and pinky. And that is because this piece is a study in the circular motion of the wrist (and really, the entire arm). If you do not realize this and try to simply try to stretch your fingers to go from key to key, not only will that limit your ability to increase your speed, but will build tension in your wrist as you go through this piece and eventually lead to injury. Not to mention that it really hurts to stretch your fingers with a static wrist.

(In my Jan Ekier edition of this piece, some of these ~15 key chords have two fingering suggestions that you can play with to decide which one you prefer.)

It may eventually be solvable, but this is a multiple dimensional problem, and a useful AI for this will need to give you a solution in multiple dimensions. If an AI can teach me all the motions of Chopin's etudes and allow me to just think about how to voice these pieces, maybe I won't need a teacher anymore.

Most of the information a pianist uses for fingering are the feeling of the hand and the sound of the piano while playing. This is not visible. Eyes are of little use for musicians. There are blind master pianists.
clicking the link at the top of the page may help explain :) https://github.com/google-research/robopianist/
That URL also explains why it only works on chrome.
Hi author here! The app should work on mobile/desktop and was tested on both Safari and Chrome. I've heard it's buggy for some people (unclear if it's an older hardware problem) but you can try this embedded demo which works better: https://kzakka.com/robopianist/#demo
It works here on Firefox, although playing is sorta syncopated. Very likely a timing issue.
I had. It really doesn’t help much.
Any tips on starting out in learning piano?
Get a good teacher.