-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AXI returns unexpected 0s #1576
Comments
This issue can be addressed by adding the following Verilog to the destination master port in AXI top-level file
And replace
Might this bit modification can be realized in Calyx source code. |
Reopening this! @zzy666666zzy we were all traveling last week so can look into this a bit more now that we’re back. |
Hi @rachitnigam Just realized another issue. In my matrix multiplications example (please see the MLIR and my script attached),
But for 16x16 the order is:
The sequence of these ports is important because for standalone(baremetal) platform we need to configure the base address of inputs and output ports manually through Is there any approach to avoid the shuffled sequence of Thanks a lot. |
I updated the calyx repo, the shuffled sequence of |
Hi,
Following the previous issue #1470, we have managed to run a 4x4 GEMM on a Standalone (bared-metal) platform.
When we tried bigger matrix sizes (8x8, 16x16), I printed the output matrix from the ARM sides, the IP core only gave us 16 correct elements (2 rows of correct elements for 8x8, 1 row of correct elements for 16x16). The rest elements are all zeros:


The correct output should be
When I delve into the generated Toplevel AXI verilog code, I found some clues.
In our case, 'm1' outputs the resultant matrix. We go to the module
Memory_controller_axi_1
, and look at this snippetbram_read_data
is the result from the core GEMM logic (ext_mem0__write_data) and always produces correct results.BUT, when we try bigger matrix sizes and
send_addr_offset equals
is bigger than 16,WDATA
is always zero because it left shifts (>16)x32 bit which is over 512-b. Previously, we could get the correct result from 4x4 because we luckily saturated the 512-b with 16 resultant elements.I am sure the core logic without the AXI wrapper can output correct results, we have tested 4x4 to 64x64. We have changed
discover-external:default=xxx
according to our matrix sizes. From the generated AXI verilog code we can tell thatdiscover-external:default=xxx
affectssend_addr_offset
and thereby affects the number of shifts. I attached the MLIR and the AXI code here for convenient reference.gemm8x8.mlir.txt
gemm8x8_wrapper.sv.txt
Also the doubled under-score issue in
main kernel_inst
still exists, signal inconsistency leads to synthesis failure.Appreciate it if there is any solution for returning 0s and the doubled underscore issue.
Many thanks.
The text was updated successfully, but these errors were encountered: