Design of video controller in H.264 video decoding chip

This article refers to the address: http://

introduction

H.264 is a new video compression standard jointly researched by ITU-T VCEG organization and ISO/IEC MPEG organization. Compared with other video compression algorithms, H.264 has the characteristics of high compression ratio and complex algorithm. Due to the complexity of the encoding algorithm, the system has very strict requirements on image decoding speed and power consumption. Therefore, the design scheme of the H.264 decoding dedicated chip is adopted when designing the decoder. For a large design project, the top-down (TOP-DOWM) design method is generally adopted to divide each functional module into sub-modules. The video controller module is the data interface between the chip and the display platform. It plays an important role in verifying the success of the chip design. It is necessary to divide it into a sub-module. In order to improve the design success rate, FPGA-based prototyping was adopted at the beginning of the design. The FPGA prototype verification platform of the whole system is shown in Figure 1. The platform is divided into two parts, hardware design and RISC CPU-based software decoding. The two parts work together to verify the decoding results of software and hardware, and accelerate the entire decoding. process.

FPGA prototype verification platform for H.264 decoder chip

Figure 1 FPGA prototype verification platform for H.264 decoder chip

Output video control module block diagram

Figure 2 Block diagram of the output video control module

Design and implementation of video control module

Block diagram and function analysis of video control module

The block diagram of the output video control module is shown in Figure 2. This module has two clock domains: the system clock domain and the display clock domain. The system clock frequency is fixed at 166MHz depending on the type of SDRAM selected. For HDTVs with a resolution of 1280×720, the display clock domain can use a frequency of around 70 MHz.

The system clock domain contains two external interfaces: a system interface, which mainly includes instructions issued by the upper layer system and feedback information of the output control module; and a DRAM interface, which includes a signal provided by the data dedicated bus for the output control module, for requesting display to the DRAM. Image data.

The display input control sub-module (Disp In Ctrl) in the system clock domain is first used to receive the StartDisp and EndDisp signals from the system to enable or disable the output display function of the video data, and simultaneously issue a frame image display completion signal (FrameDone). The notification system replaces the address information of the next image (ImageAddress); secondly, it is used to make a request to the DRAM to read the image data to be displayed through the dedicated data channel; it also controls the input multiplex module (Input MUX). Thereby completing the task of writing data to the on-chip SRAM; finally, the module interacts with the information of the display clock domain, and sends a display enable signal (DispEn Sys) to the clock domain synchronization module (Clk Domain Sync) to control the opening of the image display and shut down. Another sub-module of the system clock domain - the input multi-channel selection module will select the on-chip dual-port SRAM according to a certain rule, control the memory address, and complete the task of writing display image data to the memory.

The display clock domain contains an external display device interface, which mainly contains control signals for display and data information for completed conversion. The display clock domain includes two sub-modules, one is an output multi-channel selection sub-module (Output MUX), which is used to implement selection and address control of the dual-port SRAM, and read image data to be displayed according to a certain rule; Packing. Another sub-module is the display output control module (Disp Out Ctrl), which is used to control the TV encoder, convert the YUV signal to RGB signals, and scale the digital image, including display clock, line synchronization, frame synchronization, and RGB image data, etc.; it also controls the output multiplexer to read the display data; finally, it interacts with the system clock domain to match the transfer of data between the two clock domains.

Special technology used by the video control module

The clock domain synchronization module is the focus of the output control module design, which is mainly responsible for the control signal transmission between the two clock domains. The signal transmission design across the clock domain is cumbersome, so the signals transmitted in the design are divided into two categories: data signals and control signals, wherein the control signals are transmitted through the clock domain synchronization module. The number of signals that need to be transmitted across the clock domain is reduced. In the final scheme, only two signals are needed: the WrDone signal is sent by the system clock domain, and the notification shows that the data in a dual-port SRAM in the clock domain has been updated and can be read and The display output is performed; the RdDone signal is sent by the display clock domain to notify the system clock domain that the data in a dual-port SRAM has been displayed, and the internal data can be updated. Signals passing between different clock domains require a measure to eliminate metastability, allowing the signal to be latched out through a two-stage register, as shown in Figure 3.

Cross-clock domain signal metastable cancellation circuit

Figure 3 Cross-clock domain signal metastable cancellation circuit

Hardware implementation block diagram of video output submodule

Figure 4 Hardware implementation block diagram of the video output submodule

There are two points worth noting in the design. First, the clock domain synchronization circuit should be placed in a separate module to ensure the optimization of the integrated tools, the timing analysis is correct, and facilitate the analysis and debugging of the circuit; at the same time, in order to enable the signal target The signal changes are collected in the clock domain, and the control signals transmitted in the design are characterized by level signals.

Another type of signal to be transmitted between the clock domains is the data signal. Since the number of data signals is large and the change is fast, their transmission is realized by the dual port DPRAM. Dual-port DPRAM requires the read/write port to operate at the same memory address for a certain time interval. Otherwise, data transmission errors may occur and the hardware circuit may be destroyed. Therefore, in order to avoid the read and write conflict of DPRAM, the design uses the "ping-pong" buffering method. The two DPRAMs alternately access the brightness or color difference data used for display after decoding: when the display part reads the data in a piece of DPRAM, the system Write the data to be displayed next to another DPRAM. When the data is read, the two DPRAMs are exchanged. This part is shared by four DPRAMs, two of which transmit luminance signals and two of which transmit color difference signals.

The format conversion algorithm, image scaling processing algorithm and their hardware implementations used in the video controller display output sub-module are analyzed below.

Display data format conversion analysis

According to the Sil 164 DVI signal encoding chip data, and referring to the YUV → RGB conversion format given in the H.264 video coding standard, the fixed conversion algorithm used in the design is as follows:

formula

The above formula is fixed-pointed, and the conversion is implemented using the shift and add methods, as shown in the following equation:

formula

The YUV and RGB signals in the hardware design are represented by 8-bit unsigned numbers, and the intermediate variables are guaranteed by 12 bits. Finally, the calculated RGB result is trimmed from 0 to 255. The power exponent and division in the equation are all implemented by shifting.

Algorithm Analysis of Digital Image Scaling

For an original image with a resolution of M×N, the YUV values ​​of all the sample points can be expressed by the M×N order matrix:

formula

The pixel points are represented by f(m, n), where 0 ≤ m ≤ M, 0 ≤ n. To scale a digital image, the essence is to resample a digital image. Assuming that the scaling factors for scaling the height and width of the original digital image are S1 and S2, respectively, according to Nyquist's sampling law, it should be used. New horizontal and vertical sampling period 740)this.width=740" border="undefined"> Resample the original digital image. Get the scaled digital image f'(m',n'):

formula

As can be seen from the above equation, each reconstructed pixel f'(m', n') in the scaled digital image is the weighted sum of the individual pixels of the original digital image. If you use this formula to directly design the hardware, the amount of calculation will be very large. In order to simplify the design and save the cost of the chip, the above formula can be simplified on the basis of having little influence on the image quality. The reconstructed image pixel values ​​are primarily dependent on the value of the product of the two sampling functions. In practice only formula The point whose value is equal to 1, that is, the point that is satisfied. Further simplification, which can be taken, indicates that the logarithm is rounded off to obtain a simplified expression: f'(m', n') = f(m, n).

Hardware implementation of digital image format conversion and scaling

When designing this project, the display device uses a high-definition television with a resolution of 1280×720, and the image is center-aligned when output to a high-definition television display. When the decoded digital image data is sent to the high definition television display, if the image scaling processing is not performed, the decoded digital image is placed in the middle of the display screen, and the other places are filled with black. Follow the above rules when performing scaling processing. First, the front end of the video controller output module is arranged according to the data sent by the progressive scan to perform data format conversion, and then the pixel data of RGB not zero (ie, not black) is alternately placed in two frames per frame and progressive scan. Block the same size of on-chip cache RAM, as shown in Figure 4.

The working mode is the same as that of the previous DPRAM. After reading the address of the data in RAM1 or RAM2, the row and column address of the pixel value of the point can be obtained by the address decoder, that is, the values ​​of m and n are obtained. Sending the m and n values ​​to the image scaling processing unit, obtaining new image data and a new image data address by scaling processing, and then obtaining an address outputted in the output RAM 3 in a progressive scan format by the write address decoder, the address Used to store format converted data. Finally, the RGB data required for display can be directly output from the RAM 3 storing the converted data.

Conclusion

After the design is completed, the video controller module is integrated with the synthesis tool Synplify 7.6 to obtain an operating frequency of 80.3 MHz. Downloaded to the Xilinx Virtex-II 6000 FPGA with the front-end decoding module and integrated into the H.264 video decoding verification platform, the operating frequency can reach 34MHz, and the effect is better when playing images on high definition TV. it is good.

LED kitchen lights

Led Cabinet Ceiling Light,Led Lights,Square Led Kitchen Ceiling Lamp,Led Kitchen Lights

Bailina Lighting Electric Appliance Co., Ltd. , http://www.cn-leds.com