Hacker News new | ask | show | jobs
by trishume 1502 days ago
I really hope he can work with cloud vendors and Intel to make Processor Trace a more popular and easier to use capability.

It's unfortunate how https://github.com/janestreet/magic-trace and PMUs in general can't be used by lots of people using cloud VMs.

1 comments

Yes, getting PMCs enabled in VMs was just the start, I think the next hardware capabilities to enable are:

  - PEBS (Precise/Processor event based sampling, so that we can accurately get instruction pointers on PMC events)
  - uncore PMCs (in a safe manner)
  - LBR (last branch record, to aid stack walking)
  - BTS (branch trace store, " ")
  - Processor trace (for cycle traces)
Processor trace may be the final boss. We've got through level 1, PMCs, now onto PEBS and beyond.
Can this be safely/efficiently virtualized? I love using these tools but post-spectre I could understand people being hesitant to expose more internal "state" (I.e. Technically unique to a VM but only one processor bug away from kaboom?).

Congrats on the job.

Thanks! We have to work through each capability carefully. Some won't be safe, and will be available on bare-metal instances only. That may be ok, as it fits with the following evolution of an application (this is something I did for some recent talks):

  1. FaaS
  2. Containers
  3. Lightweight VMs (e.g., Firecracker)
  4. Bare-metal instances
As (and if) an application grows, it migrates to platforms with greater performance and observability.

The ship has sailed on neighbor detection BTW. There's so many ways to know you're a VM with neighbors that disabling PMCs for that reason alone doesn't make sense.

The ship has sailed on neighbor detection BTW.

In the crudest sense of "do I have a neighbour", sure. Of course, that's hardly secret -- if you're in EC2 you can just count your CPUs to figure that out.

But there's more questions you can ask:

1. Is my neighbour busy right now?

2. Is my neighbour a busy web server, a busy database, or a busy application server?

3. Is my neighbour hosting Brendan's website?

4. Is my neighbour hosting Brendan's website and he's logged in writing a blog post in vi right now?

5. What's Brendan writing right now?

It's not immediately clear which of these questions can be answered using certain capabilities! Few people would have guessed that you could read text off someone's screen using hyperthreading prior to 2005, for example. (Pretty simple although I don't know if anyone has published exploit code for it: Just look at which cache lines are fetched fetching glyphs to render to the screen.)

Congrats man, it sounds like a dream job for you. It will be fun to follow your blog at your next job. Thanks again for sharing everything that you do, it is so incredibly humbling and such a great learning experience.
On AMD systems, many hardware performance counters are locked behind BIOS flags/configuration.

I admit that I don't know how Intel works, but disabling the use of these performance-counters at startup should be sufficient for any potential security problem.

I'd expect that only development boxes (maybe staging?) would be interested in performance counters anyway. Maybe the occasional development box could be setup for performance-sampling and collecting these counters, but not all production boxes need to be run with performance-counters on.

No I want these performance counters everywhere. Obviously I know they can be disabled but that doesn't really help.

I also really want them in CI but that might be a long way away.

Being able to collect performance data from production boxes is invaluable.
Yes, getting LBR data from production workloads is the whole ballgame for AutoFDO/SamplePGO and BOLT/Propeller. You cannot access the LBR on any EC2 machine short of a "metal" instance.
When it comes to PGO (vs. profiling the whole system) though it's worth noting that a lot of the speedup comes from things which are too trivial for us humans to consider.

When I profiled the D compiler with and without PGO enabled it became obvious that a lot of the speedup of PGO basically comes just from running the program, the choice of testcases made almost no difference.

> not all production boxes need to be run with performance-counters on.

Production is exactly the place where you want full performance counter support, all the time, everywhere, on every machine.

Right. That's all good, but the important question is: what will your desk look like at Intel?[1]

1. Meta: https://twitter.com/brendangregg/status/1515482126871044098

One question: are you hiring?