The bottleneck in GNN computations is that the aggregation ops cant be expressed as matrix operations and require writing custom kernels. This problem was solved in PyTorch with torch-scatter. The other bottleneck is subsampling (e.g k-hop) which also dont benefit from GPU support. Other than that the embedding aspects can just be written as nn ops.
Deep Graph Library (DGL) is the big one, which can use either PyTorch, MXNet or Tensorflow as the backend and is developed by AWS. You also have PyTorch Geometric and Jraph, which is built on top of JAX and used mostly by researchers at DeepMind as far as I can tell.