L1 Cache Eviction Corrupting DDR on A9

Hi All!

I am working with a Xilinx Zynq 7000 SoC which uses the Cortex A9 as a CPU.

I've observed a problem wherein a section of memory marked strongly-ordered and non-cacheable (0xc02) in the MMU table gets corrupted by what appears to be L1 evictions back to DDR.

Setup: Linux master on CPU0 with FreeRTOS on CPU1. During the boot process, the region from 512MB to 1GB is marked 0xc02 in the translation table and aliased back to the lower region (0 to 512MB). This has the effect of allowing accesses to the same physical memory with different region attributes. The Linux CPU owns the L2 cache and its controller, the L2 cache is disabled on CPU1.

Thus, a pointer offset by 0x20000000 from its original value returned by malloc should be considered uncacheable, and all memory accesses will go directly to memory. I am using a buffer of 1024 integers, which is allocated by malloc then offset to make all accesses uncacheable.

Issue: After performing a memcpy to the uncached buffer, the value matches the source exactly. However, after a short amount of time, the uncached buffer drifts from the source (which remains unchanged throughout). When the buffer is instead marked as cached, this corruption does not occur, which leads me to believe that stale data is being evicted from the L1 cache and overwriting the new clean data that was placed in DDR.

I have tried disabling, flushing, and invalidating the cache (both before and after the memcpy), but these did not work. The buffer is unaligned to the L1 cache size, which would cause corruption at the front and end entries in the buffer from accesses to the cached pointers before and after, but the corruption is spread randomly throughout the buffer in chunks of 8 entries (8*4 = 32, the L1 line size). Additionally, I've tried disabling the prefetch bits in the ACTLR. Looking at the the disassembly of memcpy though, it does not issue any PLD instructions to the destination, only to the source.

What else could be the cause of this, and what else could I try to fix the issue of not being able to write to an uncached region?

Thanks!!!

Parents
  • One odd thing that I noticed, was that if I set up a nonblocking delay of anything longer than 100 ms right before I allocated this buffer, the corruption problem never occurred once it was allocated and populated.

    Yes, that sounds a pretty typical manifestation of a coherency failure with conflicting attributes.

    If you end up with conflicting page table attributes then you are really at the mercy of what is wedged in TLB / uTLBs around the system.

    Inserting waiting makes it more likely that the conflicting TLB entries and cache lines have been evicted, so you are more likely to end up with only one of the attribute sets active at any point in time, but really you are just playing with statistics at this point and there are no guarantees it would work. You would be very surprised at how long things can lurk in the main TLB ...

    Could you explain to me what the ARUSER and the AWUSER bits are doing in what you mentioned above?

    These signals basically propagate the shareability information out of the core and down to the masters further down the cache hierarchy. The Cortex-A9 L2 cache is an external block (PL310) from the CPU core,  and these signals effectively propagate the essential TLB page information to the L2 so it does the right thing.

    Do you think it would be possible to instead work at the physical and not virtual level when setting memory access attributes?

    No; it's a virtual memory architecture, so everything operates on virtual addresses. Any consistency management for aliased mappings has to be enforced by the various software pieces that are running; the hardware can't do this itself.

    Cheers,

    Pete

Reply
  • One odd thing that I noticed, was that if I set up a nonblocking delay of anything longer than 100 ms right before I allocated this buffer, the corruption problem never occurred once it was allocated and populated.

    Yes, that sounds a pretty typical manifestation of a coherency failure with conflicting attributes.

    If you end up with conflicting page table attributes then you are really at the mercy of what is wedged in TLB / uTLBs around the system.

    Inserting waiting makes it more likely that the conflicting TLB entries and cache lines have been evicted, so you are more likely to end up with only one of the attribute sets active at any point in time, but really you are just playing with statistics at this point and there are no guarantees it would work. You would be very surprised at how long things can lurk in the main TLB ...

    Could you explain to me what the ARUSER and the AWUSER bits are doing in what you mentioned above?

    These signals basically propagate the shareability information out of the core and down to the masters further down the cache hierarchy. The Cortex-A9 L2 cache is an external block (PL310) from the CPU core,  and these signals effectively propagate the essential TLB page information to the L2 so it does the right thing.

    Do you think it would be possible to instead work at the physical and not virtual level when setting memory access attributes?

    No; it's a virtual memory architecture, so everything operates on virtual addresses. Any consistency management for aliased mappings has to be enforced by the various software pieces that are running; the hardware can't do this itself.

    Cheers,

    Pete

Children
No data