Releases · alibaba/RecIS

29 Dec 07:05

github-actions

v1.1.0

7456f88

Release v1.1.0 Latest

Latest

We are excited to announce the release of RecIS v1.1.0. This version marks a significant milestone with the introduction of Model Bank 1.0, native ROCm support, and substantial performance optimizations for large-scale embedding tables.

🌟 Key Highlights

Category	Description
🏆 Framework	Model Bank 1.0 officially arrives; New Negative Sampler and RTP Exporter support.
⚡ Performance	Introduction of Auto-resizing Hash Tables and Fused AdamW TF CUDA operations.
🌐 Compatibility	Expanded hardware support for AMD ROCm; Fixed non-NVIDIA device kernel launches.
🛡️ Robustness	Improved multi-node synchronization and robust handling for empty tensor edge cases.

📝 Detailed Changelog

Bug Fixes

checkpoint: fix mos version format, update use openlm api (854bbb3)
checkpoint: refine torch_rank_weights_embs_table_multi_shard.json format (d5e7a5c)
checkpoint: walk around save bug, deal with xpfs model path (ae99728)
embedding: fix empty kernel launch in non-nvidia device (2e310d0)
embedding: fix insert when size == 1 (7702c9e)
framework: add an option for algo_config for export (0ad4c3f)
framework: fix bugs of invalid index, grad accumulation; add clear child feat (1e7acf9)
framework: fix eval in trainer (676a053)
framework: fix fg && exporter bugs (3964ce2)
framework: fix load extra info not in ckpt (a64cd00)
framework: fix loss backward (7d9a41b)
framework: fix some bug of model bank (be196db)
framework: fix window io failover (cde3049)
framework: reset io state when start another epoch (f918f24)
io: fix batch_convert row_splits when dataset read empty data (44661ab)
io: fix None data when window switch (e788b4d)
io: fix odps import bug (7c13f09)
io: use openstorage get_table_size directly (d5c0952)
ops: fix bug in fast atomic operations (fea8d47)
ops: fix dense_to_ragged op when check_invalid=False (#14) (300a77b)
ops: fix edge cases for empty tensors and improve CUDA kernel handling (794be12)
ops: fix emb segment reduce mean op (3f82b9c)
ops: handle empty tensor inputs in ragged ops (a39fc2a)
optimizer: step add 1 should be in-place (cdb3632)
serialize: fix bug of file sync of multi node (822af49)
serialize: fix bug of load tensor (e25eee4)
serialize: fix bug when load by oname (e5ca3d7)
serialize: fix bug when tensor num < parallel num (a02aded)
tools: fix torch_fx_tool string format (1d426f8)

Features

checkpoint: add label for ckpt (5436b5b)
checkpoint: load dense optimizer by named_parameters (a07dbaf)
docs: add model bank docs (ff0d23e)
embedding: add monitor for ids/embs (2f268eb)
embedding: expose methods to retrieve child ids and embs from the coalesced hashtable; fix clear method of hashtable (b5de207)
framework,checkpoint: change checkpointmanager to save/load hooks (eb3b441)
framework: [internal] add negative sampler (8c21517)
framework: add exporter for rtp (b8af849)
framework: add skip option in model bank (00828ce)
framework: add some utility to RaggedTensor (78eca0a)
framework: add window_iter for window pipline (87886a0)
framework: collect eval result for hooks and fix after_data bug (81d3723)
framework: enable amp by options (db5bbe7)
framework: impl-independent monitor (24a1631)
framework: model bank 1.0 (488672b)
framework: support filter hashtable for saver, update hook for window, fix metric (01eb2ae)
io: add adaptor filter by scene (c3e6738)
io: add new dedup option for neg sampler (61b2cb7)
io: add standard fg for input features (2deedff)
ops: add fused AdamW TF CUDA operation (05dba24)
ops: add parse_sample_id ops (78674cd)
packaging: support ROCm (7a626d3)
serialize: update load metric interface (66b085d)
update column-io to support ROCm device (7907158)

Performance Improvements

embedding: use auto-resizing hash table (2f53f53)

Assets 2

18 Sep 13:39

github-actions

v1.0.0

b004c56

Release v1.0.0

🎉 Initial Release

RecIS (Recommendation Intelligence System) v1.0.0 is now officially released! This is a unified architecture deep learning framework designed specifically for ultra-large-scale sparse models, built on the PyTorch open-source ecosystem. It has been widely used in Alibaba advertising, recommendation, searching and other scenarios.

✨ New Features

Core Architecture

ColumnIO: Data Reading
- Supports distributed sharded data reading
- Completes simple feature pre-computation during the reading phase
- Assembles samples into Torch Tensors and provides data prefetching functionality
Feature Engine: Feature Processing
- Provides feature engineering and feature transformation processing capabilities, including Hash / Mod / Bucketize, etc.
- Supports automatic operator fusion optimization strategies
Embedding Engine: Embedding Management and Computing
- Provides conflict-free, scalable KV storage embedding tables
- Provides multi-table fusion optimization capabilities for better memory access performance
- Supports feature elimination and admission strategies
Saver: Parameter Saving and Loading
- Provides sparse parameter storage and delivery capabilities in SafeTensors standard format
Pipelines: Training Process Orchestration
- Connects the above components and encapsulates training processes
- Supports complex training workflows such as multi-stage (training/testing interleaved) and multi-objective computation

🛠️ Installation & Compatibility

System Requirements

Python: 3.10+
PyTorch: 2.4+
CUDA: 12.4

Installation Methods

Docker Installation: Pre-built Docker images for PyTorch 2.4.0/2.5.1/2.6.0
Source Installation: Complete build system with CMake and setuptools

Dependencies

torch>=2.4
accelerate==0.29.2
simple-parsing
pyarrow (for ORC support)

📚 Documentation

Complete English and Chinese documentation
Quick start tutorials with CTR model examples
Comprehensive API reference
Installation guides for different environments
FAQ and troubleshooting guides

📦 Package Structure

Core Library: recis/ - Main framework code
C++ Extensions: csrc/ - High-performance C++ implementations
Documentation: docs/ - Comprehensive documentation in RST format
Examples: examples/ - Practical usage examples
Tools: tools/ - Data conversion and utility tools
Tests: tests/ - Comprehensive test suite

🚀 Key Optimizations

Efficient Dynamic Embedding

The RecIS framework implements efficient dynamic embedding (HashTable) through a two-level storage architecture:

IDMap: Serves as first-level storage, using feature ID as key and Offset as value
EmbeddingBlocks:
- Serves as second-level storage, continuous sharded memory blocks for storing embedding parameters and optimizer states
- Supports dynamic sharding with flexible expansion capabilities
Flexible Hardware Adaptation Strategy: Supports both GPU and CPU placement for IDMap and EmbeddingBlocks

Distributed Optimization

Parameter Aggregation and Sharding:
- During model creation phase, merges parameter tables with identical properties (dimensions, initializers, etc.) into one logical table
- Parameters are evenly distributed across compute nodes
Request Merging and Splitting:
- During forward computation, merges requests for parameter tables with identical properties and computes sharding information with deduplication
- Obtains embedding vectors from various compute nodes through All-to-All collective communication

Efficient Hardware Resource Utilization

GPU Concurrency Optimization:
- Supports feature processing operator fusion optimization, significantly reducing operator count and launch overhead
Parameter Table Fusion Optimization:
- Supports merging parameter tables with identical properties, reducing feature lookup frequency, significantly decreasing operator count, and improving memory space utilization efficiency
Operator Implementation Optimization:
- Operator implementations use vectorized memory access to improve memory utilization
- Optimizes reduction operators through warp-level merging, reducing atomic operations and improving memory access utilization

🤝 Community & Support

Open source under Apache 2.0 license
Issue tracking and community support
Active development by XDL Team

For detailed usage instructions, please refer to our documentation and quick start guide.

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

We are excited to announce the release of RecIS v1.1.0. This version marks a significant milestone with the introduction of Model Bank 1.0, native ROCm support, and substantial performance optimizations for large-scale embedding tables.

🌟 Key Highlights

📝 Detailed Changelog

Bug Fixes

Features

Performance Improvements

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🎉 Initial Release

✨ New Features

Core Architecture

🛠️ Installation & Compatibility

System Requirements

Installation Methods

Dependencies

📚 Documentation

📦 Package Structure

🚀 Key Optimizations

Efficient Dynamic Embedding

Distributed Optimization

Efficient Hardware Resource Utilization

🤝 Community & Support

Uh oh!

Releases: alibaba/RecIS

Release v1.1.0

We are excited to announce the release of RecIS v1.1.0. This version marks a significant milestone with the introduction of Model Bank 1.0, native ROCm support, and substantial performance optimizations for large-scale embedding tables.

🌟 Key Highlights

📝 Detailed Changelog

Bug Fixes

Features

Performance Improvements

Uh oh!

Release v1.0.0

🎉 Initial Release

✨ New Features

Core Architecture

🛠️ Installation & Compatibility

System Requirements

Installation Methods

Dependencies

📚 Documentation

📦 Package Structure

🚀 Key Optimizations

Efficient Dynamic Embedding

Distributed Optimization

Efficient Hardware Resource Utilization

🤝 Community & Support

Uh oh!