You're right. Starterware can go much faster. I forgot to enable Optimization in the Compiler Options. Using Level3, I measured something around 7 kHz instead of 413 Hz with Starterware.
Besides activiating VFPv3 and Neon in the Compiler Options, this documents
http://www.ti.com/lit/ug/spnu151i/spnu151i.pdf
(page 32 at the bottom next to --neon)
tells you that at least level 2 of optimization is needed when you want the compiler to use the Neon FPU for speeding up your code.