Hello Asheesh,
Finally we confirmed what was stated at the very beginning of the thread: complex data packing might be good for either DSPLIB or intrinsics, but not for both, if DSP is running in LE mode. I know I could fix that with extra instructions, but why would then I use intrinsics? Just for information, here is assembly of swap with negation:
24 y = _complex_mpysp( a, b ); 00808508: 033C43E6 LDDW.D2T2 *+B15[2],B7:B6 0080850c: 023C63E6 LDDW.D2T2 *+B15[3],B5:B4 00808510: 00006000 NOP 4 00808514: 1210CF02 CMPYSP.M2 B7:B6,B5:B4,B7:B6:B5:B4 00808518: 00004000 NOP 3 0080851c: 1210C79A DADDSP.L2 B7:B6,B5:B4,B5:B4 00808520: 00002000 NOP 2 00808524: 023C83C6 STDW.D2T2 B5:B4,*+B15[4] 25 p = _ftof2(_lof2(y), -_hif2(y)); 00808528: 033C83E6 LDDW.D2T2 *+B15[4],B7:B6 0080852c: 05A6 MVK.L1 0,A3 0080852e: F9A2 SET.S1 A3,31,31,A3 00808530: 2C6E NOP 2 00808532: E347 MV.L2 B6,B7 00808534: 030CB2E2 || XOR.S2X B5,A3,B6 00808538: 033CA3C6 STDW.D2T2 B7:B6,*+B15[5] 0080853c: E3000200 .fphead n, l, W, BU, nobr, nosat, 0011000b
I see 14 cycles in complex multiply itself and at least 8 in swap with negation. Don't you feel that's too expensive?
What is more important to me, is there a plan to implement ImRe version of DSPLIB, or, at least, SP FFT, similarly to DSP_fft16x16_imre?
Thanks.