|
|
|
|
|
by kzakka
1175 days ago
|
|
Hi, one of the authors here :) The demo you are watching is an agent trained from scratch with reinforcement learning. It has roughly 6 days of experience (10M steps at 20 Hz). The Javascript demo is replaying the policy open loop which is why it's not super robust to disturbances. Re:fingering: we actually use fingering information to create a dense reward for the agent (otherwise it makes exploration super hard). It would be an exciting future direction to have the agent discover and optimize for fingering that best suits its kinematics :) And beyond that, having RL inform pianists about the difficulty of a piece or even more optimal fingering would be amazing. We trained a bunch of these policies on roughly 150 songs (baroque, romantic, classical) and we did some analysis in the paper if you're interested: https://kzakka.com/robopianist/robopianist.pdf |
|
There are two motions in particular that pianists use constantly that don't seem to be represented in the robot model, if you're looking to get closer to the way that human limbs and digits operate. (Naturally there are plenty of other goals, but if you can imitate human playing you can do things like suggest fingerings or assess difficulty, as you say.)
1) turning at the elbow (so that your forearm can make an angle with the piano keyboard instead of always being perpendicular to it). It looks like you translate the forearm back and forth instead, which I assume must be a lot easier to handle because of course it's not how human arms work.
2) rotating the forearm/wrist (like turning a doorknob). Pianists do this on basically every note to a greater or lesser extent. To take an extreme example, if you alternate notes with your thumb and pinky you are almost completely using your wrist and not your fingers. Without this degree of freedom it is not really possible to emulate a competent pianist, if that is one of the eventual goals.