I would like to know how what perf counters describe remote memory loads from other cores or nodes. The Arm Cortex A-Series, Programmer's Guide for Arm v8-a (https://cs140e.sergio.bz/docs/ARMv8-A-Programmer-Guide.pdf, would be nice to know the official link to that document too)says (at 11-7):
> For multi-core and multi-cluster systems, before performing a load from external memory, the caches of L2 or L1 caches of cores within the cluster or of other clusters might also be checked
What perf counters describe such loads?
Thank you so much for the reply and the explanation, vstehle .