Yes, but that locks you in to optimising for whatever you covered with your profiling. If the character of your data changes, it'll take a recompile to change how the application performs, while a JIT can potentially choose to deoptimise/reoptimise.
I'm all for "as static as possible" toolchains, but there are optimization opportunities you simply won't have with AOT, PGO or not. E.g. consider something trivial: A program doing certain image operations that depends on dimensions passed in on the command line. A JIT could optimise the inner loops for the actual operations. To get the same with AOT even with PGO would be totally unable to deal with it without causing a massive explosion in code size.
In theory yes, but sadly i have never seen that promise realized in any consistent fashion. Same goes for Java's escape analysis. Although the principle is sound i think the engineering required is horrendously difficult to make it robust. In a very narrow window of variation it works, but should you step out of that zone it fails pretty badly. I think it will take many more years, till then pgo and metaprogramming it is.
Speculative optimisations yield about a 20% improvement in Java and more for higher level languages like Scala (and for Ruby it's off the charts). HotSpot isn't perfect by any means and C2's EA is not strong enough, but it's on track to be replaced with Graal (which is a part of what AOT is all about - you don't want your VM compiling itself at the same time as compiling your app).
PGO in C++ can yield quite significant speedups, however the difficulty of integrating it into a development workflow means that in practice it's hardly ever used. One of Java's accomplishments is that it brought PGO to the masses by making it entirely built in and automatic.
There are a set of optimisations that you can only do at runtime that you can't do with AOT. Anything that depends on data is something that you can't reliably do, such as eliding null checks if you can prove this cannot happen based on the data passed in. There are also cases where you can have multiple subclasses of a type such as a normal and a debug subclass, or multiple drivers for different back ends such as MongoDB or Cassandra, only one of which is used at runtime but you cannot know ahead of time which is selected (for example, it's based on an environment variable or system property).
The point is that while AOT can do a set of optimisations, including whole module analysis, there are a set which are only available at runtime.
I'm all for "as static as possible" toolchains, but there are optimization opportunities you simply won't have with AOT, PGO or not. E.g. consider something trivial: A program doing certain image operations that depends on dimensions passed in on the command line. A JIT could optimise the inner loops for the actual operations. To get the same with AOT even with PGO would be totally unable to deal with it without causing a massive explosion in code size.