When i execute "push {r1}", the execution time is 0.028 us.
then, when i execute "push {r1,r2}", the execution time is 0.042 us.
...
when i execute "push {r1,...,rn}", the execution time is 0.014*(n-1)+0.028 us.
I wonder why the execution time is doubled only when the first register value is stored in the stack.
Check the trm, it explains in detail the number of cycles.