Logic Circuit Assembler (LCA) / Logic Gate Assembler (LGA)
Java OpenCL Logic Circuit Simulator for simulating and debugging fully pipelined binary gate logic. Includes visual designer that also converts OpenCL C code to binary micro-fpga gate logic.
- Not designed to be sequential operation execution similar to assembly code, but rather be a continuous execution circuit definition language with core-width x pipeline-depth input work item dimensions.
- System architecture is based on 1-cycle latency FPGA gates and large SRAM block with three full-length sram block mimo OR-multiplexers to read/write int32 argument/indirect($) and store pointer values directly for each gate.
- Any external communication to the logic gate system is through SRAM direct read/write, from such as PCIe, USB, SD-card, HBM or DDR5 memory bridge controllers.
- Each separately programmable/assignable micro-FPGA gate runs internally at a multiplier speed of the main circuit clock speed to enable one clock cycle per gate operation.
- Programmer/IDE is responsible of assigning correct output pointer values for each gate considering multiple OR-multiplexed value store collision.
Logisim evolution 4.0.0 used for circuit illustrations and functional models: https://github.com/logisim-evolution/logisim-evolution
GNU Octave 10.3.0 used for generic math and generating circuit constants: https://octave.org
HxD - Hex Editor and Disk Editor: https://mh-nexus.de/en/hxd/
RISC core-gate instruction set architecture (64-bit variation of RISC-V):
Each core contains 2x 32k core-rail and 1-to-1 routing lines, 512 io-lines, and 1024 registers.
Each core contains 26-bit addressed 1MB rom, 1MB ram, 1MB touch-display ram, and 128MB nand nvram.
Every instruction uses/operates on full 64-bit register values always.
Instruction high bits can contain specific simple variations of instructions.
Each 64-bit instruction is formed from 16-bit [regX regY regZ insT] parameters.
insT parameter is formed from 8-4-4-bit [bitI insV insO] parameters.
Estimated logic transistors per core is 200k making 32k cores about 6.4 billion.
Estimated ram transistors per core is 4million 512KB and 128billion total 16GB.
Estimated compute 64-bit teraops at 5GHz per core is 5gops and 160tops total.
Opcode | Cycles | Instruction | Name | Description
----------------------------------------------------------------------------------------------------
any | any | ## | Any Raw Data | direct data line 64-bit value
0 | 1 | nopYZ | No Operation | no operation sleep constant regYZ cycles
[] empty line or white space line
// comment line
1 | 1 | jmpXY | Jump Destination | jump to regX if regYb[bitI] is set
jmpcXY insV=0 jump to regX if regYb[bitI] is set
jmpuXY insV=1 unconditional jump to regX
2 | 1 | ldiXYZ | Load 32-bit Uint | load regX with constant regYZ
3 | 2 | memXY | Memory Double | store/load[insV] regX at memory[regY]
memrXY insV=0 load
memwXY insV=1 store
4 | 1 | cmpXY | Compare to Zero | clear regXb[bitI], set to 1 if regY comp[insV]
cmpeXY insV=0 integer equal to
cmplXY insV=1 integer less than
cmpefXY insV=2 float equal to
cmplfXY insV=3 float less than
5 | 1 | intXYZ | ALU Int Operation | store integer op[insV] regY regZ to regX
addXYZ insV=0 integer add
addoXYZ insV=1 integer add overflow bit regXb[bitI]
subXYZ insV=2 integer subtract
subbXYZ insV=3 integer subtract borrow bit regXb[bitI]
mulXYZ insV=4 integer multiply
muloXYZ insV=5 integer multiply overflow
divXYZ insV=6 integer divide
divrXYZ insV=7 integer divide remainder
negXYZ insV=8 integer negate
6 | 1 | bitXYZ | ALU Bit Operation | store bitwise op[insV] regY regZ to regX
shlXYZ insV=0 bitwise shift left regZ bits
shrXYZ insV=1 bitwise shift right regZ bits
sharXYZ insV=2 bitwise shift arithmetic right regZ bits
rotlXYZ insV=3 bitwise rotate left regZ bits
rotrXYZ insV=4 bitwise rotate right regZ bits
copyXYZ insV=5 bitwise copy
notXYZ insV=6 bitwise not
orXYZ insV=7 bitwise or
andXYZ insV=8 bitwise and
nandXYZ insV=9 bitwise nand
norXYZ insV=A bitwise nor
xorXYZ insV=B bitwise xor
xnorXYZ insV=C bitwise xnor
7 | 1 | flpXYZ | ALU Flp Operation | store float op[insV] regY regZ to regX
addfXYZ insV=0 float add
subfXYZ insV=1 float subtract
mulfXYZ insV=2 float multiply
divfXYZ insV=3 float divide
negfXYZ insV=4 float negate
itfXYZ insV=5 integer to float
ftinXYZ insV=6 float to integer nearest
ftidXYZ insV=7 float to integer round down
ftiuXYZ insV=8 float to integer round up
ftitXYZ insV=9 float to integer truncate
Example distributed broadcast boot loader assembly code source and binary:
source listing | binary | explanation
----------------------------------------------------------------------------------------------------
[ init variables ]
ldi 0000 00000000 | 0000000000000002 | rom read index 0x0000000
ldi 0001 01000000 | 0001010000000002 | ram write index 0x1000000
ldi 0002 00000001 | 0002000000010002 | constant 0x1
ldi 0003 01000000 | 0003010000000002 | jump address 0x1000000
ldi 0004 00000100 | 0004000001000002 | rom to ram copy size
ldi 0005 0000001A | 00050000001A0002 | zero branch jump address
ldi 0006 00000011 | 0006000000110002 | non-zero branch jump address
ldi 0007 0000FFFF | 00070000FFFF0002 | 16-bit core num and filter
ldi 0008 0000FFFF | 00080000FFFF0002 | 16-bit core rail and filter
ldi 0009 00000020 | 0009000000200002 | 16-bit core rail and filter shift bits
[ core id zero check ]
copy 0010 00dc | 001000DC00000056 | get current core id
and 0011 0010 0007 | 0011001000070086 | get core id core index
shl 0008 0008 0009 | 0008000800090006 | shift rail mask left 32 bits
and 0012 0010 0008 | 0012001000080086 | get core id rail index
shr 0012 0012 0009 | 0012001200090016 | shift core id rail index right 32 bits
cmpe 0013 0011 | 0013001100000004 | set 1 if core id is zero
jmpc 0005 0013 | 0005001300000001 | jump to core zero code
[ core id non-zero branch ]
nop 00000002 | 0000000000020000 | exact sync wait 3 cycles with zero branch
copy 0030 00E0 | 003000E000000056 | get external rom data from core rail zero
memw 0030 0001 | 0030000100000013 | store external rom data to ram
add 0000 0000 0002 | 0000000000020005 | rom index++
add 0001 0001 0002 | 0001000100020005 | ram index++
sub 0031 0000 0004 | 0031000000040025 | rom index minus copy size
cmpl 0032 0031 | 0032003100000014 | set 1 if rom index < copy size
jmpc 0006 0032 | 0006003200000001 | if rom index < copy size loop back
jmpu 0003 | 0003000000000011 | jump to ram start if done
[ core id zero branch ]
copy 00df 0000 | 00DF000000000056 | put rom index to output1
copy 0030 00df | 003000DF00000056 | get rom data from input1
copy 00E0 0030 | 00E0003000000056 | store external rom data to core rail zero
memw 0030 0001 | 0030000100000013 | store external rom data to ram
nop 0000 | 0000000000000000 | exact sync wait 1 cycles with zero branch
add 0000 0000 0002 | 0000000000020005 | rom index++
add 0001 0001 0002 | 0001000100020005 | ram index++
sub 0031 0000 0004 | 0031000000040025 | rom index minus copy size
cmpl 0032 0031 | 0032003100000014 | set 1 if rom index < copy size
jmpc 0005 0032 | 0005003200000001 | if rom index < copy size loop back
jmpu 0003 | 0003000000000011 | jump to ram start if done
Example looping test assembly code source and binary:
source listing | binary | explanation
----------------------------------------------------------------------------------------------------
[] | 0000000000000000 | empty line
// empty line | 0000000000000000 | comment line
nop 00000200 | 0000000002000000 | no operation sleep 512+1 cycles
ldi 0000 00000001 | 0000000000010002 | load register 0 with value 0x1, current fibonacci number
ldi 0001 00000001 | 0001000000010002 | load register 1 with value 0x1, previous fibonacci number
ldi 0002 00000000 | 0002000000000002 | load register 2 with value 0x0, previous+ fibonacci number
ldi 0003 00000000 | 0003000000000002 | load register 3 with value 0x0, for loop index from 0
ldi 0004 00000020 | 0004000000200002 | load register 4 with value 0x20, for loop less than 32
ldi 0005 01000018 | 0005010000180002 | load register 5 with value 0x1000018, ram store start index
ldi 0006 00000001 | 0006000000010002 | load register 6 with value 0x1, constant 0x1 add and jump
ldi 0007 0100000C | 00070100000C0002 | load register 7 with value 0x100000C constant jump address
ldi 000b 01000000 | 000b010000000002 | load register 11 with value 0x1000000 constant jump address
copy 0002 0001 | 0002000100000056 | copy register 1 to register 2
copy 0001 0000 | 0001000000000056 | copy register 0 to register 1
add 0000 0001 0002 | 0000000100020005 | store addition of register 1 and register 2 to register 0
add 000a 0005 0003 | 000a000500030005 | store addition of register 5 and register 3 to register 10
memw 0000 000a | 0000000a00000013 | store register 0 to register 10 memory location
add 0003 0003 0006 | 0003000300060005 | store addition of register 3 and register 6 to register 3
sub 0008 0003 0004 | 0008000300040025 | store subtract of register 3 and register 4 to register 8
cmpl 0009 0008 | 0009000800000014 | clear register 9 bit 0, set if register 8 int less than 0
jmpc 0007 0009 | 0007000900000001 | jump to register 7 if register 9 bit 0 is set
jmpu 000b | 000b000000000011 | unconditional jump to register 11
## A123456789ABCDEF | a123456789abcdef | custom data segment with any instruction or data
Example looping test assembly to c-code approximate:
while(true) { // infinite while loop
long fib1 = 0x1; // init fib1 with 64-bit long integer value 1
long fib2 = 0x1; // init fib2 with 64-bit long integer value 1
long fib3 = 0x0; // init fib3 with 64-bit long integer value 0
long *mem = 0x18; // init mem as 64-bit long integer pointer at address 0x18
for (long i=0;i<32;i++) { // for loop 64-bit long integer i index value from 0 to 31
fib3 = fib2; // copy old fib2 value to fib3
fib2 = fib1; // copy old fib1 value to fib2
fib1 = fib2 + fib3; // calculate new fib1 value by adding fib2 and fib3
mem[i] = fib1; // store fib1 value to mem location +i index
} // for loop close
} // infinite while loop close







