Skip to content

v0.53.0

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 25 Nov 19:09
· 894 commits to main since this release

Warning

Not all model demos or performance benchmarks were checked for this release. Please refer to our README, documentation, or open issues on our GitHub repo .

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12016702477

📦 Uncategorized

  • #14773: Set default to true when getting active ethernet cores
  • #11795: Update test_pgm_dispatch and sweep
  • #14880: Ternary composite op clean up
  • #14928: Ternary backward clean up
  • #14930: Complex backward op clean up
  • #0: Update Mixtral target
  • #14665: add new moreh_clip_grad_norm and test in ttnn
  • #14730: Support unequal ranked inputs for eltwise binary
  • Fix double deallocate in llama3 attention
  • #14862: fp32 support in unary
  • Angle op fix
  • Fix a non-c-typedef-for-linkage error
  • Add experimental fused qk ROPE
  • [skip ci] #14001: Add an ALIAS target for consuming TTNN
  • #0: Disable llama test_model from all-post-commit CI pipeline
  • float32 tilize support
  • Move NUM_CIRCULAR_BUFFERS to hw/inc
  • Mchiou/14961 disable gs profiler ring buffer
  • #14990: Address feedback in Programming Mesh of Devices Tech Report
  • #11512: Add sweep test for ttnn.transformers.attention_softmax
  • #14826: Remove misoptimizations from init code
  • Use cluster desc yaml on BH and pass PCIe NoC endpoint to device
  • Increase packer precision for bfp8 formats
  • Revert "Angle op fix"
  • use do_crt1 like other cores
  • Fixed incorrect mem size for DebugIErisc
  • Dvartanians/mbahnas/yolov4 web demo traced
  • [skip ci] Update CODEOWNERS
  • Added tt-train to the tt-metal monorepo
  • #0: Disable Unity builds to detect bitrot
  • Update Resnet50 perf on n150
  • [skip ci] Add GEMM techreport to explain WH performance
  • Alignment fix for BH in I2S and S2I
  • [skip ci] Update README.md (MM FLOPS)
  • FD refactor + sub device support
  • #0: Provide script for installing system dependencies
  • Build with unity in build-artifact.yaml, don't use unity in build.yaml
  • Move NOC_0_X/Y behind Hal
  • Add reduce_scatter t3k perf to pipeline
  • add initial fabric erisc data mover (EDM) impl
  • Revert "Alignment fix for BH in I2S and S2I"
  • Revert "use do_crt1 like other cores"
  • Revert "#14826: Remove misoptimizations from init code"
  • Reduce dependence on ARCH_NAME in dev_msgs.h
  • graph trace update - extract_circular_buffers_peak_size_per_core
  • Llama-Vision: Enable tracing, refactor generation code
  • [tt-train] Added mesh support
  • #13655: Fix sub-device tests for BH
  • LlamaVision: Move xattn cache generation to text prefill forward
  • Revert "Reduce dependence on ARCH_NAME in dev_msgs.h"
  • Alignment fix for BH on I2S and S2I (fix after revert)
  • Update size.hpp
  • [skip ci] #0: update yolov4 READMEs
  • #15073: Fix use after move in ttnn run_operation
  • Restructure supported params table for ternary ops
  • Change tt_SiliconDevice to tt::umd::Cluster
  • #14546: Fix moreh_adamw power_tile reduce performance
  • Update documentation for LERP
  • Restructure supported params table for ternary backward ops
  • #14999: Update scatter golden function
  • Update ternary and backward ternary pybind examples
  • (REDO) Reduce dependence on ARCH_NAME in dev_msgs.h
  • #13521: New sweep for pytorch tracing - ttnn.add
  • #14590: Move sfpi off LFS
  • Add multi-block support for matmul_2d
  • Enable CCache for builds
  • Enable clang-tidy check for use after move
  • #14688 Scan the repo with clang-tidy as part of post-commit
  • Add tunneler tests to ci
  • #11795: Added tests that dispatch randomly-generated Programs and alternate between using trace and not using trace
  • #14895: enable gp-rel in kernels
  • Add entry to MM benchmark
  • Fix use-after-move
  • #15123 This check is clean
  • #5174: Uplifitng microbenchmarks to run on BH
  • Relax Max Pool Requirement For C To Be Power Of 2
  • Remove LFS from tt-train
  • #14985: Update the examples for binary backward doc
  • Add integer support for eltwise ops
  • #0: Use logical shape in validation check
  • [CCL] Compute device utilization percentage
  • #15144: Increase trace region for yolo to fix
  • #15079: make ProgramCache::is_enabled_ initialized out-of-line
  • [skip ci] Update GEMM_FLOPS.md
  • [skip ci] Update README.md
  • [skip ci] Update GEMM_FLOPS.md
  • [skip ci] Add files via upload
  • #14474: Fix OoO issues for Llama3 tests on CI
  • #0: Revert "#14730: Support unequal ranked inputs for eltwise binary (#14803)"
  • Manually address an issue that local clang-tidy trips over
  • Add Qwen2-7B model on N150
  • Add support for new logical sharding + alignment in TensorLayout and tensor creation
  • Support dst_full_sync_en flag in the WH compute kernel config pybind
  • Revert "Add tunneler tests to ci"
  • #14634: Remove usage of ARCH_NAME sp constants MEM_L1_SIZE
  • tilize_op float32 access
  • Add build config struct to HAL with base FW and local init addrs
  • Update test_pgm_dispatch_script
  • #15123 Fix performance-for-range-copy
  • #0: Improve functional generality of ttnn.concat
  • #15167: explicitly check for rank 4 in reduce special cases
  • #14985: Update binary bw example, Use logical shape
  • Disable test from running on t3k
  • Update CODEOWNERS
  • #14985: Update bias_gelu_bw example, implementation
  • Update Lerp op
  • Update Qwen expected compile time
  • #14985: Update binary bw docs
  • #13676: Add unit tests for io_bw, tan_bw, and lerp
  • Move llama single-device demo tests to perf pipeline for dashboard support
  • #14826: reorganize crt startup
  • #13929: Update the input range for ldexp test
  • #0: Remove duplicate single-card demo llama3 tests
  • #0: Add eth dispatch to test_pgm_dispatch sweeps
  • Add a Debug preset
  • #13127: Add physical_shard_shape to ShardSpec attributes
  • #13720 Make reshape-view 0 cost when possible
  • Convert Hal into a Singleton
  • Add support for arrays in CoreRangeSet
  • #0: Fix typo causing spurious perf warnings for concat
  • Update perf and latest features for llm models (Nov 18)
  • #15145: Add support for multi-device tensors in grouped convolution weight preprocessing
  • [tt-train] Fix tt-train in main branch
  • #15144: Up timeout for mamba to an obscene number because we seem to take longer for some reason that I don't understand
  • #14985: Update examples for binary backward ops
  • #15228: Fix error message in BaseShape when index is out of bounds
  • Allow Concrete Hal Translation Units to have unique include paths
  • Update binary examples and supported params Set 2
  • Add TT-NN roadmap and overview
  • Add data formats to perf report
  • Mo/14961 remove op alignment check
  • Organize contributing docs in a subdir and add notes on clang-tidy
  • #13675: update supported range for tan_bw
  • Fix N150 llama3 demo CI tests to properly save perf information to superset
  • #0: Add sweep for rw bw test
  • [tt-train] Free graph during backward pass
  • Update binary examples
  • #14974: ttnn::empty Tensor creation API for MeshDevice
  • #14427: increase erisc kernel code size
  • Update remove-stale-branches.yaml
  • Consolidate action back into this repo
  • Fix usage of deleted branch
  • #15234: disable sharded tests on Blackhole until fix is introduced
  • #15140: Fix UAF error when MeshDevice.close_devices() not invoked
  • Fix s2i op when shard grid is larger than actual used grid
  • Add a padding-aware, interleaved, tiled transpose HC with a fused padding value parameter
  • Update examples of unary backward
  • Remove CMake variable UMD_HOME
  • #0: Remove alignment requirements for Row Major tensors
  • #15078: Update clamp_bw, clip_bw with min, max tensor
  • Add forward support for PReLU
  • #0: Fix debugger install script
  • Temp workaround for perf improvement on out of box MM
  • Update binary examples and supported params Set 3
  • #15266: add sweeps based on traces
  • #14826: reimplement l1 data copy
  • #11795: Temporarily disable async dispatch
  • Update unary backward docs
  • #11975: Disable more parts of async dispatch
  • 15178: fix dprint race condition
  • #15144: Split mamba tests to be able to triage better and read logs easier
  • Force single core untilize on BH
  • Add back T3K demo test for llama3-70b (old codebase)
  • Upload images included in TT-Metalium for Beginners
  • Add concat_bw sweep
  • Update Image
  • Add support for fused update_cache
  • #0: Use TensorLayout in Tensor
  • #15263: Fix incorrect calculation for needed sub_cmds for rtas
  • Remove ARCH_NAME sp header tensix.h from tt_memory.cpp
  • #0: fix hrefs
  • #0: Restrict forced single core untilize on BH to non-sharded cases
  • #15356: Fix use of l1_alignment before declaration in command_queue_interface.hpp
  • [CCL] Fix ccl sweeps failure
  • #14983: Update unary backward docs
  • #14985: Update div_bw op
  • #14995: Angle issue - Fix
  • Npetrovic/cplx bin bw ops
  • #15242: Update ttnn.div to accept None as in Pytorch
  • Update unary doc examples set1
  • #14826: Reimplement wzerorange
  • #15242: Update ttnn.rdiv to accept None as in Pytorch
  • #0: Fix grabbing data from ethernet cores
  • UMD create_mock_cluster
  • #13647: use logical volume and do not short circuit for mean with no dim
  • matmul fix check that was looking into output_tensor instead of the input_tensor_b
  • Added extra check in reshape for shapes whose less is less than 2 (#15200 fix)
  • #14361: Finish implementing metal 1.0 API
  • #12496: Add workflow wrapper for pre-release testing of PR#15013 for Release Images
  • Revert "UMD create_mock_cluster"
  • Fix the pytorch 2.0 failed cases
  • Delete docs/source/common/images/MfB-Fig12.png
  • Rename MFB-Fig11.png to MfB-Fig11.png
  • Rename MFB-Fig12.png to MfB-Fig12.png
  • Rename MFB-Fig2.png to MfB-Fig2.png
  • #0: Update CODEOWNERS for ttnn/distributed and metal/distributed
  • Rename MFB-Fig3a.png to MfB-Fig3a.png
  • Update Codeowners for Reduction and Data movement
  • #13127: Clean up torch/numpy <-> tt tensor conversion
  • #15320: sweep expand
  • #14609: Pull inter-thread sync and counter reset fix
  • Add tests for untilize, transpose, and tilize on non-4B aligned row-dim Row Major tensors + enable support for these tensors on untilize
  • Move relocate_dev_addr behind Hal
  • Update .clang-format
  • #14553: implementing linalg.vector_norm via moreh_norm
  • #14840: use DRAM config for large-size tensors
  • #14933: Support PRelu for single element weight array
  • #15032: fix host side to_layout causing an integer overflow
  • #15242: Update ttnn.rdiv_bw to accept None as in Pytorch
  • Fix some namespace issues in ttnn header files
  • #14982: Update Unary ops examples
  • #14982: Update Unary examples
  • Add #pragma once directive to kernel headers
  • #14982: Update threshold logic
  • ttnn uses namespace tt::tt_metal
  • #14690: Reorganize gtests under tests/tt_metal/tt_metal
  • Fix typos in buffer size for matmul examples in programming_examples
  • #7493: Removing restrictions surounding the use of the CB enum to sup…