Hello everyone,
I'm analyzing two codes exactly the same on a ARM Cortex-M4F (Armv7E-M).The only difference is that one uses double precision floating-point, and the another single one.I realized that the double precision one consumes more ROM memory than the single.
My guess is that since Floating-Point Unit (FPU) of ARM Cortex-M4F is natively made for 32bits floating-points, when the code uses 64bits, the program generates more assembly code to overcome this limitation somehow.
Anyone knows why?Can anyone provide any source or material that validate your statement?
Respectfully,Tiago
If all your CPU has is a 32bit floating-point unit, any double-precision calculations will be done using the software-only 64 libraries (at a cost of ROM space, and probably more performance loss than you expected.)
; double z, z2;
; float x, y;
x += y; 4300: 4b0d ldr r3, [pc, #52] 4302: 4a0e ldr r2, [pc, #56] 4304: edd3 7a00 vldr s15, [r3] 4308: ed92 7a00 vldr s14, [r2] 430c: ee77 7a87 vadd.f32 s15, s15, s14 ;; floating point add instructionvoid loop() { 4310: b510 push {r4, lr} z += z2; 4312: 4c0b ldr r4, [pc, #44] x += y; 4314: edc3 7a00 vstr s15, [r3] z += z2; 4318: 4b0a ldr r3, [pc, #40] 431a: e9d4 0100 ldrd r0, r1, [r4] 431e: e9d3 2300 ldrd r2, r3, [r3] 4322: f002 f87b bl 641c <__adddf3> ;; call to floating point add function. 4326: e9c4 0100 strd r0, r1, [r4]
x += y;
4300: 4b0d ldr r3, [pc, #52]
4302: 4a0e ldr r2, [pc, #56]
4304: edd3 7a00 vldr s15, [r3]
4308: ed92 7a00 vldr s14, [r2]
430c: ee77 7a87 vadd.f32 s15, s15, s14 ;; floating point add instruction
void loop() {
4310: b510 push {r4, lr}
z += z2;
4312: 4c0b ldr r4, [pc, #44]
4314: edc3 7a00 vstr s15, [r3]
4318: 4b0a ldr r3, [pc, #40]
431a: e9d4 0100 ldrd r0, r1, [r4]
431e: e9d3 2300 ldrd r2, r3, [r3]
4322: f002 f87b bl 641c <__adddf3> ;; call to floating point add function.
4326: e9c4 0100 strd r0, r1, [r4]
(Possibly depending on compiler. I don't know of any compiler that uses a single point FPU to help with Doubles, and it looks "hard", but I guess it's possible.)
(I believe that M4F does not have an option for double-precision HW. That doesn't show up until M7.)
See also: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/10-useful-tips-to-using-the-floating-point-unit-on-the-arm-cortex--m4-processor