Hi Dmitry,
I think you are right: if HeapMemMP free was invalidating cache then this IPC problem might be solved. There are a couple of concerns though. First, I am not sure that HeapMemMP always aligns blocks on cache line boundary. Alloc function has align parameter. So if you allocate memory from heap directly (not through MessageQ) you might get unaligned block which can be "damaged" by invalidate when a neighbor is freed. Even if HeapMemMP aligns to 128 bytes now there is no guarantee it will in the future or on another ti architecture. The other thing is that it is possible that a core will do some type of cache prefetch. I did not look into if ti does this but again no guarantee it won't start doing this in the future and without bugs. In any case patching and testing IPC yourself is not fun, it would be nice if TI decided to implement and support this solution.
So while your approach seems to be correct and more elegant for the time being I am going to stick with invalidating on receive which does not rely on cache state. That is:
1. Before calling put write back message header + message body on the sender side (just in case).
2. After receiving pointer from MessageQ get invalidate 128 bytes after it.
3. Read size field from header.
4. Invalidate whole message.
This seems to solve my crashes and I have run my program for quite a while millions of message sends/receives.
I agree that IPC has its problems. If I were to rewrite my code I would probably switch to Open Event Machine or OpenMP (they are not so low level). Though not sure how buggy those things are.
Alexey