This document is a planning artifact only. No Java logic is translated in this step. No model behavior, data flow, training logic, or numerical operations are implemented in this step.
- Java source of truth:
../SequenceProcessing - C style/reference implementation:
../ComputationalGraph-C - Target repository:
./
- Each Java class maps to a dedicated C header and source file unless later consolidation is justified.
- The initial C layout mirrors the Java package split:
src/Classification/src/Functions/src/Parameters/src/Sequence/
- Model classes will likely need explicit C structs plus function-pointer based dispatch where Java currently relies on inheritance and overriding.
- Dependencies listed below are based on current Java imports and the existing C style in
ComputationalGraph-C.
- package:
SequenceProcessing.Classification - proposed C target files:
src/Classification/GatedRecurrentUnitModel.hsrc/Classification/GatedRecurrentUnitModel.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C: computational graph core, nodes, tensor operations, neural network parameter support, activation functionsClassification-C: likely shared model/performance abstractions if that repo exposes themWordToVec-C: none directly visible from imports
- likely additional external repo dependencies:
Math-Cfor tensor math if not fully covered throughComputationalGraph-C
- complexity: high
- port priority: later
- notes:
- depends on
AdditionByConstant,RemoveBias,Switch - uses multiple graph node types and recurrent model state handling
- depends on
- package:
SequenceProcessing.Classification - proposed C target files:
src/Classification/LongShortTermMemoryModel.hsrc/Classification/LongShortTermMemoryModel.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C: computational graph core, graph functions, node types, parameter supportClassification-C: likely shared model/performance abstractions if neededWordToVec-C: none directly visible from imports
- likely additional external repo dependencies:
Math-C
- complexity: high
- port priority: later
- notes:
- broad wildcard imports suggest heavy graph composition
- likely one of the more stateful model ports
- package:
SequenceProcessing.Classification - proposed C target files:
src/Classification/RecurrentNeuralNetworkModel.hsrc/Classification/RecurrentNeuralNetworkModel.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C: graph core, softmax, computational nodes, multiplication/concatenation nodes, neural network parameter supportClassification-C:ClassificationPerformanceequivalent appears likelyWordToVec-C: none directly visible from imports
- likely additional external repo dependencies:
Math-C
- complexity: high
- port priority: later
- notes:
- likely base recurrent model abstraction for GRU and LSTM-adjacent behavior
- dependency on classification performance reporting makes API design relevant
- package:
SequenceProcessing.Classification - proposed C target files:
src/Classification/Transformer.hsrc/Classification/Transformer.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C: graph core, node types, activation and loss functions, initialization, optimizer-facing integrationClassification-C:ClassificationPerformanceequivalent appears likelyWordToVec-C: likely indirect for vectorized vocabulary/data flow, but Java imports point more directly to dictionary/vector layers
- likely additional external repo dependencies:
Dictionary-CMath-C
- complexity: high
- port priority: blocked
- notes:
- highest architectural risk among current classes
- depends on
TransformerParameterand several custom function operators - likely needs attention/masking conventions defined before implementation
- package:
SequenceProcessing.Functions - proposed C target files:
src/Functions/AdditionByConstant.hsrc/Functions/AdditionByConstant.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C:FunctionNode-like abstraction, computational node integration, tensor mathClassification-C: none directly visibleWordToVec-C: none
- likely additional external repo dependencies:
Math-C
- complexity: medium
- port priority: blocked
- notes:
- straightforward operator logic, but blocked on missing
FunctionNodedesign in the current C graph reference
- straightforward operator logic, but blocked on missing
- package:
SequenceProcessing.Functions - proposed C target files:
src/Functions/Inverse.hsrc/Functions/Inverse.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C:FunctionNode-like abstraction, computational node integration, tensor mathClassification-C: noneWordToVec-C: none
- likely additional external repo dependencies:
Math-C
- complexity: medium
- port priority: blocked
- notes:
- likely simple tensor transform once graph-function interface is defined
- package:
SequenceProcessing.Functions - proposed C target files:
src/Functions/Mask.hsrc/Functions/Mask.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C:FunctionNode-like abstraction, computational node integration, tensor mathClassification-C: noneWordToVec-C: none
- likely additional external repo dependencies:
Math-C
- complexity: medium
- port priority: blocked
- notes:
- likely important for transformer attention masking
- package:
SequenceProcessing.Functions - proposed C target files:
src/Functions/Mean.hsrc/Functions/Mean.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C:FunctionNode-like abstraction, computational node integration, tensor mathClassification-C: noneWordToVec-C: none
- likely additional external repo dependencies:
Math-C
- complexity: medium
- port priority: blocked
- notes:
- likely shares reduction conventions with
Variance
- likely shares reduction conventions with
- package:
SequenceProcessing.Functions - proposed C target files:
src/Functions/MultiplyByConstant.hsrc/Functions/MultiplyByConstant.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C:FunctionNode-like abstraction, computational node integration, tensor mathClassification-C: noneWordToVec-C: none
- likely additional external repo dependencies:
Math-C
- complexity: medium
- port priority: blocked
- notes:
- likely low algorithmic risk once function plumbing exists
- package:
SequenceProcessing.Functions - proposed C target files:
src/Functions/RemoveBias.hsrc/Functions/RemoveBias.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C:FunctionNode-like abstraction, computational node integration, tensor mathClassification-C: noneWordToVec-C: none
- likely additional external repo dependencies:
Math-C
- complexity: medium
- port priority: blocked
- notes:
- referenced by multiple recurrent model classes
- package:
SequenceProcessing.Functions - proposed C target files:
src/Functions/SquareRoot.hsrc/Functions/SquareRoot.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C:FunctionNode-like abstraction, computational node integration, tensor mathClassification-C: noneWordToVec-C: none
- likely additional external repo dependencies:
Math-C
- complexity: medium
- port priority: blocked
- notes:
- likely simple transform after tensor operation conventions are defined
- package:
SequenceProcessing.Functions - proposed C target files:
src/Functions/Switch.hsrc/Functions/Switch.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C:FunctionNode-like abstraction, computational node integration, tensor mathClassification-C: noneWordToVec-C: none
- likely additional external repo dependencies:
Math-C
- complexity: medium
- port priority: blocked
- notes:
- referenced by recurrent model classes
- behavior may affect control-flow-like graph semantics
- package:
SequenceProcessing.Functions - proposed C target files:
src/Functions/Transpose.hsrc/Functions/Transpose.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C:FunctionNode-like abstraction, computational node integration, tensor mathClassification-C: noneWordToVec-C: none
- likely additional external repo dependencies:
Math-C
- complexity: medium
- port priority: blocked
- notes:
- likely low algorithmic complexity, but still blocked on function integration design
- package:
SequenceProcessing.Functions - proposed C target files:
src/Functions/Variance.hsrc/Functions/Variance.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C: function base type,FunctionNode-like abstraction, computational node integration, tensor mathClassification-C: noneWordToVec-C: none
- likely additional external repo dependencies:
Math-C
- complexity: medium
- port priority: blocked
- notes:
- probably tied to normalization behavior together with
MeanandSquareRoot
- probably tied to normalization behavior together with
- package:
SequenceProcessing.Parameters - proposed C target files:
src/Parameters/RecurrentNeuralNetworkParameter.hsrc/Parameters/RecurrentNeuralNetworkParameter.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C: neural network parameter base struct, initialization abstractions, function referencesClassification-C: none directly visibleWordToVec-C: none
- likely additional external repo dependencies:
- none expected beyond graph dependencies
- complexity: medium
- port priority: now
- notes:
- appears primarily parameter-container oriented
- good early candidate once base parameter embedding strategy is chosen
- package:
SequenceProcessing.Parameters - proposed C target files:
src/Parameters/TransformerParameter.hsrc/Parameters/TransformerParameter.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C: neural network parameter base struct, initialization abstractions, function referencesClassification-C: none directly visibleWordToVec-C: none
- likely additional external repo dependencies:
- none expected beyond graph dependencies
- complexity: medium
- port priority: now
- notes:
- likely a data-configuration type and a reasonable early implementation target
- package:
SequenceProcessing.Sequence - proposed C target files:
src/Sequence/LabelledSentence.hsrc/Sequence/LabelledSentence.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C: noneClassification-C: noneWordToVec-C: none directly visible
- likely additional external repo dependencies:
Corpus-C
- complexity: low
- port priority: now
- notes:
- likely simple wrapper or extension over sentence data with labels
- good first structural port candidate
- package:
SequenceProcessing.Sequence - proposed C target files:
src/Sequence/LabelledVectorizedWord.hsrc/Sequence/LabelledVectorizedWord.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C: noneClassification-C: noneWordToVec-C: likely conceptual dependency, but direct Java imports point to dictionary/vector types rather than aWordToVecpackage
- likely additional external repo dependencies:
Dictionary-CMath-C
- complexity: medium
- port priority: now
- notes:
- data-structure oriented
- likely needed before corpus loading and model input preparation
- package:
SequenceProcessing.Sequence - proposed C target files:
src/Sequence/SequenceCorpus.hsrc/Sequence/SequenceCorpus.c
- likely direct dependencies from sibling repos:
ComputationalGraph-C: noneClassification-C: noneWordToVec-C: likely indirect only
- likely additional external repo dependencies:
Corpus-CDictionary-CMath-CUtil-C
- complexity: medium
- port priority: later
- notes:
- loader/parsing logic appears implementable before model code
- blocked only if supporting corpus/file/vector APIs are not yet available in C
- all files under
src/Classification/ - all files under
src/Functions/ src/Parameters/RecurrentNeuralNetworkParameter.*src/Parameters/TransformerParameter.*
Rationale:
- Java imports point to graph nodes, graph functions, neural network parameter classes, initialization classes, and tensor-oriented operations.
- The current C reference repo already defines graph core, nodes, optimizers, initialization, and activation functions, so it is the primary architectural style guide.
src/Classification/RecurrentNeuralNetworkModel.*src/Classification/Transformer.*- possibly
src/Classification/GatedRecurrentUnitModel.* - possibly
src/Classification/LongShortTermMemoryModel.*
Rationale:
- Java imports explicitly reference
Classification.Performance.ClassificationPerformancein some model classes. - If classification metrics, training loops, or shared model interfaces live in
Classification-C, these model ports should align with them rather than duplicating that layer locally.
- no source file shows a direct Java import from a
WordToVecpackage - most vectorized-token dependencies appear to come through
Dictionary.VectorizedWordandMath.Vector - practical integration candidates:
src/Sequence/LabelledVectorizedWord.*src/Sequence/SequenceCorpus.*src/Classification/Transformer.*indirectly through vectorized vocabulary/data preparation
Rationale:
- the Maven project depends on
WordToVec, but the currently visible Java source uses dictionary/vector abstractions instead of directly importing it. - this suggests either an indirect dependency chain or shared types exposed through sibling libraries.
- none are fully self-contained in the strict sense
- lowest external coupling:
src/Sequence/LabelledSentence.*src/Parameters/RecurrentNeuralNetworkParameter.*src/Parameters/TransformerParameter.*
Rationale:
- even the simplest files still appear to rely on shared types from sibling libraries such as corpus, graph parameter, initialization, or vector types.
- proposed C test files:
test/SequenceCorpusTest.c- optionally
test/SequenceCorpusTest.h
- production files under test:
src/Sequence/SequenceCorpus.hsrc/Sequence/SequenceCorpus.c
- can be tested early:
- yes, after the sequence data structures and corpus/file-loading dependencies are available
- early test focus:
- corpus size and sentence count
- label extraction
- vectorized token loading/parsing
- basic error handling for malformed resources
- proposed C test files:
test/TransformerTest.c- optionally
test/TransformerTest.h
- production files under test:
src/Classification/Transformer.hsrc/Classification/Transformer.csrc/Parameters/TransformerParameter.hsrc/Parameters/TransformerParameter.c- dependent custom function files under
src/Functions/
- can be tested early:
- partially
- early test focus:
- parameter construction and teardown
- tensor shape expectations for helper utilities
- deterministic setup wiring if initialization stubs or mocks exist
- deferred test focus:
- end-to-end model training/inference
- optimizer integration
- classification performance assertions
src/Sequence/LabelledSentence.hsrc/Sequence/LabelledSentence.csrc/Sequence/LabelledVectorizedWord.hsrc/Sequence/LabelledVectorizedWord.csrc/Parameters/RecurrentNeuralNetworkParameter.hsrc/Parameters/RecurrentNeuralNetworkParameter.csrc/Parameters/TransformerParameter.hsrc/Parameters/TransformerParameter.c
Reasoning:
- these files appear to be mostly data-structure and configuration oriented
- they establish shared types needed by later ports
- they have materially lower algorithmic risk than model classes
src/Sequence/SequenceCorpus.hsrc/Sequence/SequenceCorpus.ctest/SequenceCorpusTest.c
Reasoning:
- corpus parsing is useful early, but it depends on Phase 1 sequence types and on confirming the C-side corpus/vector/file utility APIs
- this phase gives an early testable slice before graph-heavy model work begins
src/Functions/AdditionByConstant.hsrc/Functions/AdditionByConstant.csrc/Functions/Inverse.hsrc/Functions/Inverse.csrc/Functions/Mask.hsrc/Functions/Mask.csrc/Functions/Mean.hsrc/Functions/Mean.csrc/Functions/MultiplyByConstant.hsrc/Functions/MultiplyByConstant.csrc/Functions/RemoveBias.hsrc/Functions/RemoveBias.csrc/Functions/SquareRoot.hsrc/Functions/SquareRoot.csrc/Functions/Switch.hsrc/Functions/Switch.csrc/Functions/Transpose.hsrc/Functions/Transpose.csrc/Functions/Variance.hsrc/Functions/Variance.c
Reasoning:
- the Java code assumes a graph-function layer with
FunctionNode ComputationalGraph-Cdoes not currently expose an obviousFunctionNodecounterpart in the observed files- these operators should wait until the C representation for custom graph functions is settled
src/Classification/RecurrentNeuralNetworkModel.hsrc/Classification/RecurrentNeuralNetworkModel.csrc/Classification/GatedRecurrentUnitModel.hsrc/Classification/GatedRecurrentUnitModel.csrc/Classification/LongShortTermMemoryModel.hsrc/Classification/LongShortTermMemoryModel.csrc/Classification/Transformer.hsrc/Classification/Transformer.ctest/TransformerTest.c
Reasoning:
- these files sit on top of nearly every earlier phase
- they require settled parameter structs, function/operator plumbing, graph integration, and likely classification metric interfaces
Transformeris the most design-sensitive class and should be last among the current set
ComputationalGraph-Ccurrently shows node and function support, but the Java-sideFunctionNodeabstraction is not obviously present in the observed C reference files.- The exact C equivalents for Java dependencies from
Corpus,Dictionary,Util,Math, and possiblyClassificationare not yet confirmed locally. - The Maven dependency on
WordToVecappears indirect from the current imports, so the exact integration surface still needs confirmation before implementation. - Model classes will require an explicit ownership and lifecycle model for tensors, nodes, parameters, and temporary graph state.
- The final public API shape for
SequenceProcessing-Cshould be decided before model implementation to avoid churn across all headers.