With the latest 24.10 Arm compilers, GROMACS hangs when built with -Ofast on SVE architecture (A64fx in my case, but this is not limited to this platform)
The hang can be observed with the 2024 and 2025 series.
Arm compilers 24.04 works fine.
Arm compiler 24.10 works fine with -O3 but hangs with -Ofast.
I was able to trim down the code to this simple reproducer
reproducer.cpp
#include <arm_sve.h> svfloat32_t foo(svbool_t mask, svfloat32_t x) { svbool_t pg = svptrue_b32(); return svsub_f32_x(pg, x, svadd_f32_x(pg, x, svsel_f32(mask, x, svdup_f32(0.0f)))); }
$ armclang++ --version Arm C/C++/Fortran Compiler version 24.10 (build number 31) (based on LLVM 19.1.0) Target: aarch64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/arm/arm-linux-compiler-24.10_RHEL-8/llvm-bin $ armclang++ -march=armv8.2-a+sve -c -O3 reproducer.cpp # OK $ armclang++ -march=armv8.2-a+sve -c -Ofast reproducer.cpp # HANG
Thanks, I can confirm GROMACS can be built with these options (minus the typo, it should be `-O3 -ffp-contract=fast`.FWIW, the performance are slightly slower compared to Arm compilers 24.04 with -Ofast), so I look forward to this getting fixed in the upcoming release.