AXI returns unexpected 0s #1576

zzy666666zzy · 2023-06-19T21:01:15Z

Hi,

Following the previous issue #1470, we have managed to run a 4x4 GEMM on a Standalone (bared-metal) platform.

When we tried bigger matrix sizes (8x8, 16x16), I printed the output matrix from the ARM sides, the IP core only gave us 16 correct elements (2 rows of correct elements for 8x8, 1 row of correct elements for 16x16). The rest elements are all zeros:

The correct output should be

When I delve into the generated Toplevel AXI verilog code, I found some clues.

In our case, 'm1' outputs the resultant matrix. We go to the module Memory_controller_axi_1, and look at this snippet

reg [6:0] send_addr_offset;
...
assign WDATA = {{15{32'b0}}, bram_read_data} << send_addr_offset * 32;

bram_read_data is the result from the core GEMM logic (ext_mem0__write_data) and always produces correct results.

BUT, when we try bigger matrix sizes and send_addr_offset equals is bigger than 16, WDATA is always zero because it left shifts (>16)x32 bit which is over 512-b. Previously, we could get the correct result from 4x4 because we luckily saturated the 512-b with 16 resultant elements.

I am sure the core logic without the AXI wrapper can output correct results, we have tested 4x4 to 64x64. We have changed discover-external:default=xxx according to our matrix sizes. From the generated AXI verilog code we can tell that discover-external:default=xxx affects send_addr_offset and thereby affects the number of shifts. I attached the MLIR and the AXI code here for convenient reference.
gemm8x8.mlir.txt
gemm8x8_wrapper.sv.txt

Also the doubled under-score issue in main kernel_inst still exists, signal inconsistency leads to synthesis failure.

Appreciate it if there is any solution for returning 0s and the doubled underscore issue.

Many thanks.

The text was updated successfully, but these errors were encountered:

zzy666666zzy · 2023-06-26T13:09:09Z

This issue can be addressed by adding the following Verilog to the destination master port in AXI top-level file

reg [3:0] shift_var;
always @(posedge ACLK) begin
if(memory_mode_state == 3) begin
	if(BVALID & BREADY) shift_var <= shift_var + 'd1;
	//else if (BVALID & BREADY & shift_var=='d15) shift_var<='d0;
	else shift_var <= shift_var;
end 
else shift_var <= 0;
end

And replace send_addr_offset with shift_var in

    assign WDATA = {{15{32'b0}}, bram_read_data} << send_addr_offset* 32;
    assign WSTRB = {{15{4'h0}}, 4'hF} << send_addr_offset* 4;

Might this bit modification can be realized in Calyx source code.

rachitnigam · 2023-06-26T13:26:18Z

Reopening this! @zzy666666zzy we were all traveling last week so can look into this a bit more now that we’re back.

zzy666666zzy · 2023-06-27T13:34:05Z

Hi @rachitnigam
circt_calyx4x4.sh.txt
gemm4x4.mlir.txt
gemm8x8.mlir.txt

Just realized another issue. In my matrix multiplications example (please see the MLIR and my script attached), arg0 is the output argument (output ext_mem0), arg1 and arg2 are input matrix (ext_mem1_* and 'ext_mem2'_* in Verilog file). But in the generated AXI file, the order of these three ports are randomly swapped when I increase the matrix size.
For example: inside Control_axi module, of 4x4 GEMM, the order of the three ports is:

                12'h18 : begin
                    rdata[31:0] <= addr_ext_mem0_[31:0];
                end
                12'h1c : begin
                    rdata[31:0] <= addr_ext_mem0_[63:32];
                end
                12'h20 : begin
                    rdata[31:0] <= addr_ext_mem2_[31:0];
                end
                12'h24 : begin
                    rdata[31:0] <= addr_ext_mem2_[63:32];
                end
                12'h28 : begin
                    rdata[31:0] <= addr_ext_mem1_[31:0];
                end
                12'h2c : begin
                    rdata[31:0] <= addr_ext_mem1_[63:32];
                end

But for 16x16 the order is:

                12'h18 : begin
                    rdata[31:0] <= addr_ext_mem1_[31:0];
                end
                12'h1c : begin
                    rdata[31:0] <= addr_ext_mem1_[63:32];
                end
                12'h20 : begin
                    rdata[31:0] <= addr_ext_mem0_[31:0];
                end
                12'h24 : begin
                    rdata[31:0] <= addr_ext_mem0_[63:32];
                end
                12'h28 : begin
                    rdata[31:0] <= addr_ext_mem2_[31:0];
                end
                12'h2c : begin
                    rdata[31:0] <= addr_ext_mem2_[63:32];
                end

The sequence of these ports is important because for standalone(baremetal) platform we need to configure the base address of inputs and output ports manually through s_axi_control.

Is there any approach to avoid the shuffled sequence of addr_ext_mem*_?

Thanks a lot.

zzy666666zzy · 2023-06-27T14:49:22Z

Hi @rachitnigam circt_calyx4x4.sh.txt gemm4x4.mlir.txt gemm8x8.mlir.txt

Just realized another issue. In my matrix multiplications example (please see the MLIR and my script attached), arg0 is the output argument (output ext_mem0), arg1 and arg2 are input matrix (ext_mem1_* and 'ext_mem2'_* in Verilog file). But in the generated AXI file, the order of these three ports are randomly swapped when I increase the matrix size. For example: inside Control_axi module, of 4x4 GEMM, the order of the three ports is:
                12'h18 : begin
                    rdata[31:0] <= addr_ext_mem0_[31:0];
                end
                12'h1c : begin
                    rdata[31:0] <= addr_ext_mem0_[63:32];
                end
                12'h20 : begin
                    rdata[31:0] <= addr_ext_mem2_[31:0];
                end
                12'h24 : begin
                    rdata[31:0] <= addr_ext_mem2_[63:32];
                end
                12'h28 : begin
                    rdata[31:0] <= addr_ext_mem1_[31:0];
                end
                12'h2c : begin
                    rdata[31:0] <= addr_ext_mem1_[63:32];
                end
But for 16x16 the order is:
                12'h18 : begin
                    rdata[31:0] <= addr_ext_mem1_[31:0];
                end
                12'h1c : begin
                    rdata[31:0] <= addr_ext_mem1_[63:32];
                end
                12'h20 : begin
                    rdata[31:0] <= addr_ext_mem0_[31:0];
                end
                12'h24 : begin
                    rdata[31:0] <= addr_ext_mem0_[63:32];
                end
                12'h28 : begin
                    rdata[31:0] <= addr_ext_mem2_[31:0];
                end
                12'h2c : begin
                    rdata[31:0] <= addr_ext_mem2_[63:32];
                end
The sequence of these ports is important because for standalone(baremetal) platform we need to configure the base address of inputs and output ports manually through s_axi_control.

Is there any approach to avoid the shuffled sequence of addr_ext_mem*_?

Thanks a lot.

I updated the calyx repo, the shuffled sequence of ext_mem* seems being solved.

zzy666666zzy closed this as completed Jun 26, 2023

rachitnigam reopened this Jun 26, 2023

rachitnigam added C: FPGA Changes for the FPGA backend Type: Bug Bug in the implementation labels Jun 27, 2023

rachitnigam mentioned this issue Jul 17, 2023

Other bug is that in the Toplevel module, the port of instantiated hardware is different: #1575

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AXI returns unexpected 0s #1576

AXI returns unexpected 0s #1576

zzy666666zzy commented Jun 19, 2023

zzy666666zzy commented Jun 26, 2023

rachitnigam commented Jun 26, 2023

zzy666666zzy commented Jun 27, 2023

zzy666666zzy commented Jun 27, 2023

AXI returns unexpected 0s #1576

AXI returns unexpected 0s #1576

Comments

zzy666666zzy commented Jun 19, 2023

zzy666666zzy commented Jun 26, 2023

rachitnigam commented Jun 26, 2023

zzy666666zzy commented Jun 27, 2023

zzy666666zzy commented Jun 27, 2023