In this work, power efficient butterfly unit based FFT architecture is presented. The butterfly unit is designed using floating-point fused arithmetic units. The fused arithmetic units include two-term dot product uni...In this work, power efficient butterfly unit based FFT architecture is presented. The butterfly unit is designed using floating-point fused arithmetic units. The fused arithmetic units include two-term dot product unit and add-subtract unit. In these arithmetic units, operations are performed over complex data values. A modified fused floating-point two-term dot product and an enhanced model for the Radix-4 FFT butterfly unit are proposed. The modified fused two-term dot product is designed using Radix-16 booth multiplier. Radix-16 booth multiplier will reduce the switching activities compared to Radix-8 booth multiplier in existing system and also will reduce the area required. The proposed architecture is implemented efficiently for Radix-4 decimation in time(DIT) FFT butterfly with the two floating-point fused arithmetic units. The proposed enhanced architecture is synthesized, implemented, placed and routed on a FPGA device using Xilinx ISE tool. It is observed that the Radix-4 DIT fused floating-point FFT butterfly requires 50.17% less space and 12.16% reduced power compared to the existing methods and the proposed enhanced model requires 49.82% less space on the FPGA device compared to the proposed design. Also, reduced power consumption is addressed by utilizing the reusability technique, which results in 11.42% of power reduction of the enhanced model compared to the proposed design.展开更多
The algorithm and its implementation of the leading zero anticipation (LZA) are very vital for the performance of a high-speed floating-point adder in today's state of art microprocessor design. Unfortunately, in p...The algorithm and its implementation of the leading zero anticipation (LZA) are very vital for the performance of a high-speed floating-point adder in today's state of art microprocessor design. Unfortunately, in predicting "shift amount" by a conventional LZA design, the result could be off by one position. This paper presents a novel parallel error detection algorithm for a general-case LZA. The proposed approach enables parallel execution of conventional LZA and its error detection, so that the error-indicatlon signal can be generated earlier in the stage of normalization, thus reducing the critical path and improving overall performance. The circuit implementation of this algorithm also shows its advantages of area and power compared with other previous work.展开更多
文摘In this work, power efficient butterfly unit based FFT architecture is presented. The butterfly unit is designed using floating-point fused arithmetic units. The fused arithmetic units include two-term dot product unit and add-subtract unit. In these arithmetic units, operations are performed over complex data values. A modified fused floating-point two-term dot product and an enhanced model for the Radix-4 FFT butterfly unit are proposed. The modified fused two-term dot product is designed using Radix-16 booth multiplier. Radix-16 booth multiplier will reduce the switching activities compared to Radix-8 booth multiplier in existing system and also will reduce the area required. The proposed architecture is implemented efficiently for Radix-4 decimation in time(DIT) FFT butterfly with the two floating-point fused arithmetic units. The proposed enhanced architecture is synthesized, implemented, placed and routed on a FPGA device using Xilinx ISE tool. It is observed that the Radix-4 DIT fused floating-point FFT butterfly requires 50.17% less space and 12.16% reduced power compared to the existing methods and the proposed enhanced model requires 49.82% less space on the FPGA device compared to the proposed design. Also, reduced power consumption is addressed by utilizing the reusability technique, which results in 11.42% of power reduction of the enhanced model compared to the proposed design.
文摘The algorithm and its implementation of the leading zero anticipation (LZA) are very vital for the performance of a high-speed floating-point adder in today's state of art microprocessor design. Unfortunately, in predicting "shift amount" by a conventional LZA design, the result could be off by one position. This paper presents a novel parallel error detection algorithm for a general-case LZA. The proposed approach enables parallel execution of conventional LZA and its error detection, so that the error-indicatlon signal can be generated earlier in the stage of normalization, thus reducing the critical path and improving overall performance. The circuit implementation of this algorithm also shows its advantages of area and power compared with other previous work.