Skip to content

Commit e765d1c

Browse files
csukuangfjYour Name
authored and
Your Name
committed
Add doc about FST-based CTC forced alignment. (k2-fsa#1482)
1 parent 50769ff commit e765d1c

20 files changed

+787
-8
lines changed
Binary file not shown.
2.56 KB
Binary file not shown.
13.9 KB
Binary file not shown.
22 KB
Binary file not shown.
4.44 KB
Binary file not shown.

docs/source/_static/kaldi-align/i.wav

688 Bytes
Binary file not shown.
2.56 KB
Binary file not shown.
9.47 KB
Binary file not shown.
4.44 KB
Binary file not shown.
5.07 KB
Binary file not shown.

docs/source/conf.py

+2
Original file line numberDiff line numberDiff line change
@@ -98,4 +98,6 @@
9898
.. _Next-gen Kaldi: https://github.com/k2-fsa
9999
.. _Kaldi: https://github.com/kaldi-asr/kaldi
100100
.. _lilcom: https://github.com/danpovey/lilcom
101+
.. _CTC: https://www.cs.toronto.edu/~graves/icml_2006.pdf
102+
.. _kaldi-decoder: https://github.com/k2-fsa/kaldi-decoder
101103
"""

docs/source/docker/intro.rst

+2
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ which will give you something like below:
3434

3535
.. code-block:: bash
3636
37+
"torch2.3.1-cuda12.1"
38+
"torch2.3.1-cuda11.8"
3739
"torch2.2.2-cuda12.1"
3840
"torch2.2.2-cuda11.8"
3941
"torch2.2.1-cuda12.1"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
Two approaches
2+
==============
3+
4+
Two approaches for FST-based forced alignment will be described:
5+
6+
- `Kaldi`_-based
7+
- `k2`_-based
8+
9+
Note that the `Kaldi`_-based approach does not depend on `Kaldi`_ at all.
10+
That is, you don't need to install `Kaldi`_ in order to use it. Instead,
11+
we use `kaldi-decoder`_, which has ported the C++ decoding code from `Kaldi`_
12+
without depending on it.
13+
14+
Differences between the two approaches
15+
--------------------------------------
16+
17+
The following table compares the differences between the two approaches.
18+
19+
.. list-table::
20+
21+
* - Features
22+
- `Kaldi`_-based
23+
- `k2`_-based
24+
* - Support CUDA
25+
- No
26+
- Yes
27+
* - Support CPU
28+
- Yes
29+
- Yes
30+
* - Support batch processing
31+
- No
32+
- Yes on CUDA; No on CPU
33+
* - Support streaming models
34+
- Yes
35+
- No
36+
* - Support C++ APIs
37+
- Yes
38+
- Yes
39+
* - Support Python APIs
40+
- Yes
41+
- Yes
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
FST-based forced alignment
2+
==========================
3+
4+
This section describes how to perform **FST-based** ``forced alignment`` with models
5+
trained by `CTC`_ loss.
6+
7+
We use `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
8+
from `torchaudio`_ as a reference in this section.
9+
10+
Different from `torchaudio`_, we use an ``FST``-based approach.
11+
12+
.. toctree::
13+
:maxdepth: 2
14+
:caption: Contents:
15+
16+
diff
17+
kaldi-based
18+
k2-based
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
k2-based forced alignment
2+
=========================
3+
4+
TODO(fangjun)

0 commit comments

Comments
 (0)