| Xilinx与Altera FPGA比较系列之一 DSP速度(3) |
| 类别:网文精粹 |
|
大家知道,高端的FPGA中都有为数不少的DSP块,里边主要包括一些18X18的乘法器,以及加法器等单元,相邻的DSP往往可以通过专用的连线互连,从而实现滤波器的级联设计,提高滤波器的运行速度。Xilinx和Altera的DSP块有所差别,Xilinx的DSP模块可以做18X18乘法,18X18乘累加运算,18X18乘加运算等,其中累加器可以到48位宽,厂家标称的最高速度位500MHz。Altera的DSP块可以分解成为8X8, 16X16, 32X32块,可以完成乘法,乘累加,乘加等运算,厂家标称的最高速度为450MHz。
下面的表格给出了一些综合结果。
表1 转置形式的FIR滤波器综合结果
FPGA Platform
FPGA Type
Speed (MHz)
Speed (MHz)
FPGA Type
FPGA Platform
Stratix II
EP2S90F1020C3
313
165
xc4vsx35-ff668-12
Virtex 4
EP2S90F1020C4
282
154
xc4vsx35-ff668-11
EP2S90F1020C5
240
124
xc4vsx35-ff668-10
表2 直接形式的FIR滤波器综合结果
FPGA Platform
FPGA Type
Speed (MHz)
Speed (MHz)
FPGA Type
FPGA Platform
Stratix II
EP2S90F1020C3
195
109
xc4vsx35-ff668-12
Virtex 4
EP2S90F1020C4
169
101
xc4vsx35-ff668-11
EP2S90F1020C5
141
88
xc4vsx35-ff668-10
一些简单的分析:
1. Xilinx之所以速度比Altera慢一个原因可能是ISE综合时可能需要加一些约束才可以达到最佳状态,就这件事情我曾经咨询过Xilinx的应用工程师,她给了我一个使用Synplify综合的结果,速度明显比我使用ISE的好不少。
2. 有关Xilinx的DSP Block,我还试了不少其他的模块,包括简单的乘法器等,但是都不能达到器标称的500MHz,另外,ISE不能支持随意的写法,对代码的风格有一定的要求,比如,需要写成同步reset,这样才能被综合到DSP Block当中。
附件是相应的VHDL代码,欢迎大家讨论。 library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; entity NoiseFilterD is port( aReset : in std_logic; Clk : in std_logic;
cDin : in std_logic_vector(7 downto 0); cDout : out std_logic_vector(7 downto 0)); end NoiseFilterD; architecture rtl of NoiseFilterD is constant kNumCoes : positive := 57;
type IntegerArray is array (natural range <>) of integer; constant kCoe : IntegerArray(kNumCoes-1 downto 0) := ( -5, 6, 10, 0, -16, -11, 16, 31, -1, -48, -34, 45, 84, -2, -120, -84, 105, 193, -4, -272, -194, 241, 463, -5, -742, -618, 952, 3092, 4095, 3092, 952, -618, -742, -5, 463, 241, -194, -272, -4, 193, 105, -84, -120, -2, 84, 45, -34, -48, -1, 31, 16, -11, -16, 0, 10, 6, -5);
type SignedArray is array (natural range <>) of signed(7 downto 0); signal cDelayData : SignedArray(kNumCoes-1 downto 0);
type ProdArray is array (natural range <>) of signed(20 downto 0); signal cProd : ProdArray(kNumCoes-1 downto 0);
type SumArray is array (natural range <>) of signed(22 downto 0); signal cSumL1 : SumArray(13 downto 0); signal cSumL2 : SumArray(3 downto 0); signal cSumL3 : SumArray(0 downto 0);
begin
cDout <= std_logic_vector(cSumL3(0)(20 downto 13));
-- Input data delay chain process(aReset, Clk) begin if aReset='1' then for i in 0 to kNumCoes-1 loop cDelayData(i) <= (others => '0'); end loop; elsif rising_edge(Clk) then cDelayData(0) <= signed(cDin); for i in 1 to kNumCoes-1 loop cDelayData(i) <= cDelayData(i-1); end loop; end if; end process;
-- Calculate product of each tap process(aReset, Clk) begin if aReset='1' then for i in 0 to kNumCoes-1 loop cProd(i) <= (others => '0'); end loop; elsif rising_edge(Clk) then for i in 0 to kNumCoes-1 loop cProd(i) <= cDelayData(i) * to_signed(kCoe(i), 13); end loop; end if; end process;
-- calculate the first level sum process(aReset, Clk) begin if aReset='1' then for i in 0 to 13 loop cSumL1(i) <= (others => '0'); end loop; elsif rising_edge(Clk) then for i in 0 to 13 loop cSumL1(i) <= resize(cProd(i*4), 23) + resize(cProd(i*4+1), 23) + resize(cProd(i*4+2), 23) + resize(cProd(i*4+3), 23); end loop; end if; end process;
-- calculate the second level sum process(aReset, Clk) begin if aReset='1' then for i in 0 to 3 loop cSumL2(i) <= (others => '0'); end loop; elsif rising_edge(Clk) then for i in 0 to 2 loop cSumL2(i) <= cSumL1(i*4) + cSumL1(i*4+1) + cSumL1(i*4+2) + cSumL1(i*4+3); end loop; cSumL2(3) <= cSumL1(12) + cSumL1(13); end if; end process;
-- calculate the third level sum process(aReset, Clk) begin if aReset='1' then cSumL3(0) <= (others => '0'); elsif rising_edge(Clk) then cSumL3(0) <= cSumL2(0) + cSumL2(1) + cSumL2(2) + cSumL2(3); end if; end process;
end rtl; library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; entity NoiseFilterT is port (aReset : in std_logic; Clk : in std_logic;
cDin : in std_logic_vector(7 downto 0); cDout : out std_logic_vector(7 downto 0)); end NoiseFilterT; architecture rtl of NoiseFilterT is constant kNumCoes : positive := 57;
type IntegerArray is array (natural range <>) of integer; constant kCoe : IntegerArray(kNumCoes-1 downto 0) := ( -5, 6, 10, 0, -16, -11, 16, 31, -1, -48, -34, 45, 84, -2, -120, -84, 105, 193, -4, -272, -194, 241, 463, -5, -742, -618, 952, 3092, 4095, 3092, 952, -618, -742, -5, 463, 241, -194, -272, -4, 193, 105, -84, -120, -2, 84, 45, -34, -48, -1, 31, 16, -11, -16, 0, 10, 6, -5); type ProdArray is array (natural range <>) of signed(20 downto 0); signal cProd : ProdArray(kNumCoes-1 downto 0);
type SumArray is array (natural range <>) of signed(22 downto 0); signal cSum : SumArray(kNumCoes-1 downto 0);
signal cDin_ms, cDinDbSync : signed(7 downto 0);
begin
cDout <= std_logic_vector(cSum(kNumCoes-1)(20 downto 13));
process(aReset, Clk) begin if aReset='1' then cDin_ms <= (others => '0'); cDinDbSync <= (others => '0'); elsif rising_edge(Clk) then cDin_ms <= signed(cDin); cDinDbSync <= cDin_ms; end if; end process;
-- multiply the input data and process(aReset, Clk) begin if aReset='1' then for i in 0 to 56 loop cProd(i) <= (others => '0'); end loop; elsif rising_edge(Clk) then for i in 0 to 56 loop cProd(i) <= cDinDbSync * to_signed(kCoe(56-i), 13); end loop; end if; end process;
-- Add the products process(aReset, Clk) begin if aReset='1' then for i in 0 to kNumCoes-1 loop cSum(i) <= (others => '0'); end loop; elsif rising_edge(Clk) then cSum(0) <= resize(cProd(0), 23); for i in 1 to kNumCoes-1 loop cSum(i) <= cSum(i-1) + resize(cProd(i), 23); end loop; end if; end process;
end rtl;
|
- B2B搜索“联姻”..
- 2008-9-11
- HOLTEK推出H..
- 2008-9-11
- 关于ISPRO校验..
- 2008-9-11
- 利用适配板实现免制..
- 2008-9-11
- GSMSMS模块客..
- 2008-9-11
- ISPro下载型编..
- 2008-9-11
- C语言之stati..
- 2008-9-11
- 完美结合语音芯片与..
- 2008-9-11
- 又做一工具:ATm..
- 2008-9-11
- QE128的DEM..
- 2008-9-11
- 特别推荐一款高性能..
- 2008-9-11
- SATA硬盘使用解..
- 2008-9-16
- EPoX主板几种前..
- 2008-9-16
- 反其道行之降低电压..
- 2008-9-16
- 基于ATM8051..
- 2008-9-16
- THB6016H测..
- 2008-9-16
- 显示器黑屏的解决办..
- 2008-9-16
- 你会买网络服务器吗..
- 2008-9-16
- 酷漫教你制作只带一..
- 2008-9-16
- 酷漫破解网页右健锁定
- 2008-9-16



