The modern FPGAs enable system designers to develop high-performance computing (HPC) applications with a large amount of parallelism. Real-time image processing is such a requirement that demands much more processing power than a conventional processor can deliver. In this research, we implemented software and hardware based architectures on FPGA to achieve real-time image processing. Furthermore, we benchmark and compare our implemented architectures with existing architectures. The operational structures of those systems consist of on-chip processors or custom vision coprocessors implemented in a parallel manner with efficient memory and bus architectures. The performance properties such as the accuracy, throughput and efficiency are measured and presented. According to results, FPGA implementations are faster than the DSP and GPP implementations for algorithms which can exploit a large amount of parallelism. Our image pre-processing architecture is nearly two times faster than the optimized software implementation on an Intel Core 2 Duo GPP. However, because of the higher clock frequency of DSPs/GPPs, the processing speed for sequential computations on on-chip processors in FPGAs is slower than on DSPs/GPPs. These on-chip processors are well suited for multi-processor systems for software level parallelism. Our quad-Microblaze architecture achieved 75-80% performance improvement compared to its single Microblaze counterpart. Moreover, the quad-Microblaze design is faster than the single-powerPC implementation on FPFA. Therefore, multi-processor architecture with customised coprocessors are effective for implementing custom parallel architecture to achieve real time image processing. Это и многое другое вы найдете в книге Performance Evaluation of Vision Algorithms on FPGA (Mahendra Gunathilaka Samarawickrama)