According to results, FPGA implementations are faster than the DSP and GPP implementations for algorithms which can exploit a large amount of parallelism. Our image pre-processing architecture is nearly two times faster than the optimized software implementation on an Intel Core 2 Duo GPP. However, because of the higher clock frequency of DSPs/GPPs, the processing speed for sequential computations on on-chip processors in FPGAs is slower than on DSPs/GPPs. These on-chip processors are well suited for multi-processor systems for software level parallelism. Our quad-Microblaze architecture achieved 75-80% performance improvement compared to its single Microblaze counterpart. Moreover, the quad-Microblaze design is faster than the single-powerPC implementation on FPFA. Therefore, multi-processor architecture with customised coprocessors are effective for implementing custom parallel architecture to achieve real time image processing.