Java OpenCL Logic Circuit Simulator

Logic Circuit Assembler (LCA) / Logic Gate Assembler (LGA)

Java OpenCL Logic Circuit Simulator for simulating and debugging fully pipelined binary gate logic. Includes visual designer that also converts OpenCL C code to binary micro-fpga gate logic.

Not designed to be sequential operation execution similar to assembly code, but rather be a continuous execution circuit definition language with core-width x pipeline-depth input work item dimensions.
System architecture is based on 1-cycle latency FPGA gates and large SRAM block with three full-length sram block mimo OR-multiplexers to read/write int32 argument/indirect($) and store pointer values directly for each gate.
Any external communication to the logic gate system is through SRAM direct read/write, from such as PCIe, USB, SD-card, HBM or DDR5 memory bridge controllers.
Each separately programmable/assignable micro-FPGA gate runs internally at a multiplier speed of the main circuit clock speed to enable one clock cycle per gate operation.
Programmer/IDE is responsible of assigning correct output pointer values for each gate considering multiple OR-multiplexed value store collision.

Logisim evolution 4.0.0 used for circuit illustrations and functional models: https://github.com/logisim-evolution/logisim-evolution

GNU Octave 10.3.0 used for generic math and generating circuit constants: https://octave.org

HxD - Hex Editor and Disk Editor: https://mh-nexus.de/en/hxd/

RISC core-gate instruction set architecture (64-bit variation of RISC-V):

Each core contains 2x 32k core-rail and 1-to-1 routing lines, 512 io-lines, and 1024 registers.
Each core contains 26-bit addressed 1MB rom, 1MB ram, 1MB touch-display ram, and 128MB nand nvram.
Every instruction uses/operates on full 64-bit register values always.
Instruction high bits can contain specific simple variations of instructions.
Each 64-bit instruction is formed from 16-bit [regX regY regZ insT] parameters.
insT parameter is formed from 8-4-4-bit [bitI insV insO] parameters.
Estimated logic transistors per core is 200k making 32k cores about 6.4 billion.
Estimated ram transistors per core is 4million 512KB and 128billion total 16GB.
Estimated compute 64-bit teraops at 5GHz per core is 5gops and 160tops total.

Opcode | Cycles | Instruction | Name              | Description
----------------------------------------------------------------------------------------------------
any    | any    | ##          | Any Raw Data      | direct data line 64-bit value
0      | 1      | nopYZ       | No Operation      | no operation sleep constant regYZ cycles
                  []                                empty line or white space line
                  //                                comment line
1      | 1      | jmpXY       | Jump Destination  | jump to regX if regYb[bitI] is set
                  jmpcXY                            insV=0 jump to regX if regYb[bitI] is set
                  jmpuXY                            insV=1 unconditional jump to regX
2      | 1      | ldiXYZ      | Load 32-bit Uint  | load regX with constant regYZ
3      | 2      | memXY       | Memory Double     | store/load[insV] regX at memory[regY]
                  memrXY                            insV=0 load
                  memwXY                            insV=1 store
4      | 1      | cmpXY       | Compare to Zero   | clear regXb[bitI], set to 1 if regY comp[insV]
                  cmpeXY                            insV=0 integer equal to
                  cmplXY                            insV=1 integer less than
                  cmpefXY                           insV=2 float equal to
                  cmplfXY                           insV=3 float less than
5      | 1      | intXYZ      | ALU Int Operation | store integer op[insV] regY regZ to regX
                  addXYZ                            insV=0 integer add
                  addoXYZ                           insV=1 integer add overflow bit regXb[bitI]
                  subXYZ                            insV=2 integer subtract
                  subbXYZ                           insV=3 integer subtract borrow bit regXb[bitI]
                  mulXYZ                            insV=4 integer multiply
                  muloXYZ                           insV=5 integer multiply overflow
                  divXYZ                            insV=6 integer divide
                  divrXYZ                           insV=7 integer divide remainder
                  negXYZ                            insV=8 integer negate
6      | 1      | bitXYZ      | ALU Bit Operation | store bitwise op[insV] regY regZ to regX
                  shlXYZ                            insV=0 bitwise shift left regZ bits
                  shrXYZ                            insV=1 bitwise shift right regZ bits
                  sharXYZ                           insV=2 bitwise shift arithmetic right regZ bits
                  rotlXYZ                           insV=3 bitwise rotate left regZ bits
                  rotrXYZ                           insV=4 bitwise rotate right regZ bits
                  copyXYZ                           insV=5 bitwise copy
                  notXYZ                            insV=6 bitwise not
                  orXYZ                             insV=7 bitwise or
                  andXYZ                            insV=8 bitwise and
                  nandXYZ                           insV=9 bitwise nand
                  norXYZ                            insV=A bitwise nor
                  xorXYZ                            insV=B bitwise xor
                  xnorXYZ                           insV=C bitwise xnor
7      | 1      | flpXYZ      | ALU Flp Operation | store float op[insV] regY regZ to regX
                  addfXYZ                           insV=0 float add
                  subfXYZ                           insV=1 float subtract
                  mulfXYZ                           insV=2 float multiply
                  divfXYZ                           insV=3 float divide
                  negfXYZ                           insV=4 float negate
                  itfXYZ                            insV=5 integer to float
                  ftinXYZ                           insV=6 float to integer nearest
                  ftidXYZ                           insV=7 float to integer round down
                  ftiuXYZ                           insV=8 float to integer round up
                  ftitXYZ                           insV=9 float to integer truncate

Example distributed broadcast boot loader assembly code source and binary:

source listing      | binary           | explanation
----------------------------------------------------------------------------------------------------
[ init variables ]
ldi  0000 00000000  | 0000000000000002 | rom read index 0x0000000
ldi  0001 01000000  | 0001010000000002 | ram write index 0x1000000
ldi  0002 00000001  | 0002000000010002 | constant 0x1
ldi  0003 01000000  | 0003010000000002 | jump address 0x1000000
ldi  0004 00000100  | 0004000001000002 | rom to ram copy size
ldi  0005 0000001A  | 00050000001A0002 | zero branch jump address
ldi  0006 00000011  | 0006000000110002 | non-zero branch jump address
ldi  0007 0000FFFF  | 00070000FFFF0002 | 16-bit core num and filter
ldi  0008 0000FFFF  | 00080000FFFF0002 | 16-bit core rail and filter
ldi  0009 00000020  | 0009000000200002 | 16-bit core rail and filter shift bits
[ core id zero check ]
copy 0010 00dc      | 001000DC00000056 | get current core id
and  0011 0010 0007 | 0011001000070086 | get core id core index
shl  0008 0008 0009 | 0008000800090006 | shift rail mask left 32 bits
and  0012 0010 0008 | 0012001000080086 | get core id rail index
shr  0012 0012 0009 | 0012001200090016 | shift core id rail index right 32 bits
cmpe 0013 0011      | 0013001100000004 | set 1 if core id is zero
jmpc 0005 0013      | 0005001300000001 | jump to core zero code
[ core id non-zero branch ]
nop  00000002       | 0000000000020000 | exact sync wait 3 cycles with zero branch
copy 0030 00E0      | 003000E000000056 | get external rom data from core rail zero
memw 0030 0001      | 0030000100000013 | store external rom data to ram
add  0000 0000 0002 | 0000000000020005 | rom index++
add  0001 0001 0002 | 0001000100020005 | ram index++
sub  0031 0000 0004 | 0031000000040025 | rom index minus copy size
cmpl 0032 0031      | 0032003100000014 | set 1 if rom index < copy size
jmpc 0006 0032      | 0006003200000001 | if rom index < copy size loop back
jmpu 0003           | 0003000000000011 | jump to ram start if done
[ core id zero branch ]
copy 00df 0000      | 00DF000000000056 | put rom index to output1
copy 0030 00df      | 003000DF00000056 | get rom data from input1
copy 00E0 0030      | 00E0003000000056 | store external rom data to core rail zero
memw 0030 0001      | 0030000100000013 | store external rom data to ram
nop  0000           | 0000000000000000 | exact sync wait 1 cycles with zero branch
add  0000 0000 0002 | 0000000000020005 | rom index++
add  0001 0001 0002 | 0001000100020005 | ram index++
sub  0031 0000 0004 | 0031000000040025 | rom index minus copy size
cmpl 0032 0031      | 0032003100000014 | set 1 if rom index < copy size
jmpc 0005 0032      | 0005003200000001 | if rom index < copy size loop back
jmpu 0003           | 0003000000000011 | jump to ram start if done

Example looping test assembly code source and binary:

source listing      | binary           | explanation
----------------------------------------------------------------------------------------------------
[]                  | 0000000000000000 | empty line
// empty line       | 0000000000000000 | comment line
nop  00000200       | 0000000002000000 | no operation sleep 512+1 cycles
ldi  0000 00000001  | 0000000000010002 | load register 0 with value 0x1, current fibonacci number
ldi  0001 00000001  | 0001000000010002 | load register 1 with value 0x1, previous fibonacci number
ldi  0002 00000000  | 0002000000000002 | load register 2 with value 0x0, previous+ fibonacci number
ldi  0003 00000000  | 0003000000000002 | load register 3 with value 0x0, for loop index from 0
ldi  0004 00000020  | 0004000000200002 | load register 4 with value 0x20, for loop less than 32
ldi  0005 01000018  | 0005010000180002 | load register 5 with value 0x1000018, ram store start index
ldi  0006 00000001  | 0006000000010002 | load register 6 with value 0x1, constant 0x1 add and jump
ldi  0007 0100000C  | 00070100000C0002 | load register 7 with value 0x100000C constant jump address
ldi  000b 01000000  | 000b010000000002 | load register 11 with value 0x1000000 constant jump address
copy 0002 0001      | 0002000100000056 | copy register 1 to register 2
copy 0001 0000      | 0001000000000056 | copy register 0 to register 1
add  0000 0001 0002 | 0000000100020005 | store addition of register 1 and register 2 to register 0
add  000a 0005 0003 | 000a000500030005 | store addition of register 5 and register 3 to register 10
memw 0000 000a      | 0000000a00000013 | store register 0 to register 10 memory location
add  0003 0003 0006 | 0003000300060005 | store addition of register 3 and register 6 to register 3
sub  0008 0003 0004 | 0008000300040025 | store subtract of register 3 and register 4 to register 8
cmpl 0009 0008      | 0009000800000014 | clear register 9 bit 0, set if register 8 int less than 0
jmpc 0007 0009      | 0007000900000001 | jump to register 7 if register 9 bit 0 is set
jmpu 000b           | 000b000000000011 | unconditional jump to register 11
## A123456789ABCDEF | a123456789abcdef | custom data segment with any instruction or data

Example looping test assembly to c-code approximate:

while(true) {                   // infinite while loop
  long fib1 = 0x1;              // init fib1 with 64-bit long integer value 1
  long fib2 = 0x1;              // init fib2 with 64-bit long integer value 1
  long fib3 = 0x0;              // init fib3 with 64-bit long integer value 0
  long *mem = 0x18;             // init mem as 64-bit long integer pointer at address 0x18
  for (long i=0;i<32;i++) {     // for loop 64-bit long integer i index value from 0 to 31
    fib3 = fib2;                // copy old fib2 value to fib3
    fib2 = fib1;                // copy old fib1 value to fib2
    fib1 = fib2 + fib3;         // calculate new fib1 value by adding fib2 and fib3
    mem[i] = fib1;              // store fib1 value to mem location +i index
  }                             // for loop close
}                               // infinite while loop close

Name		Name	Last commit message	Last commit date
Latest commit History 430 Commits
src		src
README.md		README.md
arcsinecoeff.m		arcsinecoeff.m
arcsinenewton.m		arcsinenewton.m
basicgates.circ		basicgates.circ
loader.asm		loader.asm
loader.bin		loader.bin
logiccircuitgateassembler.pdf		logiccircuitgateassembler.pdf
microfpgamux.circ		microfpgamux.circ
muxflpalu.circ		muxflpalu.circ
muxrisccore.circ		muxrisccore.circ
source.asm		source.asm
source.bin		source.bin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Java OpenCL Logic Circuit Simulator

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Java OpenCL Logic Circuit Simulator

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages