Realtime Kernel Newbie

I want to get Linux to respond to events within 2.5us, and the RT kernel seems to be able to do this. I know that the RT kernel has preemption, and so syscalls can be interrupted. However, when I run a cyclictest, I’m finding that the average response time is 53us, while the minimum is 2us. My CPU has a minimum frequency of 800MHz (1.25ns). Only one task needs to have priority over everything else, and once it is running, it will block other tasks from taking control until it is completed for a short period of time (maybe 100us). What can I do to decrease the latency? Is the 2us minimum my context-switching cost, or can that be even lower? Can I also guarantee that a process will be executed in a 10ms time window with Linux RT? Thanks.

sudo cyclictest --smp
# /dev/cpu_dma_latency set to 0us
policy: other/other: loadavg: 1.33 1.00 0.73 1/812 6256
T: 0 ( 6185) P: 0 I:1000 C: 151177 Min: 3 Act: 53 Avg: 56 Max: 9369
T: 1 ( 6186) P: 0 I:1500 C: 100878 Min: 2 Act: 53 Avg: 56 Max: 9277
T: 2 ( 6187) P: 0 I:2000 C: 75667 Min: 3 Act: 54 Avg: 61 Max: 8991
T: 3 ( 6188) P: 0 I:2500 C: 60539 Min: 4 Act: 53 Avg: 57 Max: 8689