Continuousimprovementsinvideo/imageapplicationresearchrequireincreasinglysophisticated multimedia algorithms, demanding a rising amount of processing performance. Current video coding standards, e.g. MPEG-4, VC-1, or H.264/AVC, are reaching the limits of existing media-processors for high definition quality. In addition, this demand for performance has a direct influence on the power consumption, becoming an important issue in portable multimedia devices. Therefore, efficient media-processor implementations are mandatory, considering the trade-offs between processing performance, silicon area, and power consumption.
This work analyzes architectural alternatives of the design of application-specific VLIW processors for multimedia applications by using a comprehensive design space exploration environment. This environment provides a generic VLIW architecture template, which includes a configurable pipeline architecture simulator, an enhanced instruction scheduler, and a parameterized HDL description of the architecture template. By using this environment, the architecture template can be optimized in terms of performance and hardware cost for a set of multimedia applications.
The combination of new enhanced hardware architecture mechanisms and the corresponding instruction scheduler improvements plays an important role to overcome the architecture bottlenecks. Based on this concept, different application-independent data/instruction parallel mechanisms are exhaustively described and analyzed in this work, e.g. an area-efficient partitioned register file organization, a novel instruction merging mechanism, or a new forwarding-based instruction scheduling. In addition, applicationdependent mechanisms are also proposed, e.g. an enhanced DMA controller.
Finally, two case studies are analyzed, where the generic VLIW architecture is optimized for a set of video coding tasks. The results demonstrate that highly efficient VLIW architecture configurations can be obtained by implementing the proposed applicationindependent mechanisms, reaching up to 29% of silicon area reduction and up to 45% of power saving of the VLIW processor core for an equivalent processing performance. In addition, for specific applications, higher performances (up to 52%) can also be obtained by specializing the DMA controller, which increases the silicon area only by 5%.