I program A53 without OS for some arithmetic operations.
The task generates 2K 32b numbers using polynomial of CRC32, store, move from/to and compares different portions of 32b numbers in L1 data cache, continuously.
Right now I get a instruction per cycle of 1.05 in Xilinx's Zynq Ultrascale device. What is a guesstimated IPC for such workloads?
I am pondering whether there are room for improvement from 1.05.
I understand the A53 has two instruction decoders, would that mean the peak IPC would be 2?
Thank you.
Thank you Pete.
A55 is very similar to A53 I know so the opt guide would applicable to A53 too.