layout | title |
---|---|
page |
Publications |
You can also browse my Google Scholar profile.
2024
-
Faster Speech-LLaMA inference with multi-token prediction
Desh Raj, Gil Keren, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli
Submitted to IEEE ICASSP 2025
Paper{: .btn} -
M-BEST-RQ: A multi-channel speech foundation model for smart glasses
Yufeng Yang, Desh Raj, Ju Lin, Niko Moritz, Junteng Jia, Gil Keren, Egor Lakomkin, Yiteng Huang, Jacob Donley, Jay Mahadeokar, Ozlem Kalinli
Submitted to IEEE ICASSP 2025
Paper{: .btn} -
ConEC: Earnings Call Dataset with Real-world Contexts for Benchmarking Contextual Speech Recognition
Ruizhe Huang, Mahsa Yarmohammadi, Jan Trmal, Jing Liu, Desh Raj, Leibny Paola Garcia, Alexei V Ivanov, Patrick Ehlen, Mingzhi Yu, Dan Povey, Sanjeev Khudanpur
LREC 2024
Paper{: .btn} -
Listening to multi-talker conversations: Modular and end-to-end perspectives
Desh Raj
PhD Thesis, Johns Hopkins University
Thesis{: .btn} Slides{: .btn} Video{: .btn} -
On speaker attribution with SURT
Desh Raj, Matthew Wiesner, Matthew Maciejewski, Paola Garcia, Daniel Povey, Sanjeev Khudanpur
Speaker Odyssey 2024
Paper{: .btn} Slides{: .btn} -
Updated corpora and benchmarks for long-form speech recognition
Jennifer Drexler Fox, Desh Raj, Natalie Delworth, Quinn McNamara, Corey Miller, Migüel Jetté
IEEE ICASSP 2024
Paper{: .btn} Code{: .btn} -
Training Early-Exit Architectures for Automatic Speech Recognition: Fine-Tuning Pre-Trained Models or Training from Scratch
George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Alessio Brutti
IEEE ICASSP 2024 Workshop on Self-supervision in Audio, Speech, and Beyond (SASB)
Paper{: .btn}
2023
-
Learning from flawed data: Weakly supervised automatic speech recognition
Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel Povey, Sanjeev Khudanpur
IEEE ASRU 2023
Paper{: .btn} Code{: .btn} -
SURT 2.0: Advances in transducer-based multi-talker speech recognition
Desh Raj, Daniel Povey, Sanjeev Khudanpur
IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
Paper{: .btn} ArXiv{: .btn} Poster{: .btn} Webpage{: .btn} -
The CHiME-7 DASR challenge: Distant meeting transcription with multiple devices in diverse scenarios
Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola Garcia, Matthew Maciejewski, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur
CHiME Workshop at InterSpeech 2023
Paper{: .btn} Website{: .btn} -
GPU-accelerated guided source separation for meeting transcription
Desh Raj, Daniel Povey, Sanjeev Khudanpur
InterSpeech 2023
Paper{: .btn} ArXiv{: .btn} Poster{: .btn} Code{: .btn} -
Anchored speech recognition using neural transducers
Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli
IEEE ICASSP 2023
Paper{: .btn} Slides{: .btn} Video{: .btn} -
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Zili Huang, Desh Raj, Paola Garcia, Sanjeev Khudanpur
IEEE ICASSP 2023
Paper{: .btn} Code{: .btn}
2022
-
Low-Latency speech separation guided diarization for telephone conversations
Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini
IEEE Spoken Language Technology (SLT) Workshop 2022
Paper{: .btn} -
Continuous streaming multi-talker ASR with dual-path transducers
Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li
IEEE ICASSP 2022
Paper{: .btn} Slides{: .btn} Poster{: .btn} Video{: .btn} -
Injecting text and cross-lingual supervision in few-shot learning from self-supervised models
Matthew Wiesner, Desh Raj, Sanjeev Khudanpur
IEEE ICASSP 2022
Paper{: .btn} Code{: .btn} Poster{: .btn} Video (Matthew){: .btn}
2021
-
Joint speaker diarization and speech recognition based on region proposal networks
Zili Huang, Marc Delcroix, Leibny Paola Garcia, Shinji Watanabe, Desh Raj, Sanjeev Khudanpur
Computer, Speech, and Language, Vol. 72
Paper{: .btn} -
Reformulating DOVER-Lap label mapping as a graph partitioning problem
Desh Raj, Sanjeev Khudanpur
INTERSPEECH 2021
Paper{: .btn} Code{: .btn} Report{: .btn} Slides{: .btn} Video{: .btn} -
Auxiliary loss function for target speech extraction and recognition with weak supervision based on speaker characteristics
Katerina Zmolikova, Marc Delcroix, Desh Raj, Shinji Watanabe, Jan Černocký
INTERSPEECH 2021
Paper{: .btn} -
Target-speaker voice activity detection with improved i-vector estimation for unknown number of speaker
Mao-Kui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe
INTERSPEECH 2021
Paper{: .btn} -
Training hybrid models on noisy transliterated transcripts for code-switched speech recognition
Matthew Wiesner, Mousmita Sarma, Ashish Arora, Desh Raj, Dongji Gao, Ruizhe Huang, Supreet Preet, Moris Johnson, Zikra Iqbal, Nagendra Goel, Jan Trmal, Leibny Garcıa-Perera, Sanjeev Khudanpur
INTERSPEECH 2021
Paper{: .btn} Code{: .btn} -
The Hitachi-JHU DIHARD III system: Competitive end-to-end neural diarization and x-vector clustering systems combined by DOVER-Lap
Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur
Third DIHARD Speech Diarization Challenge
Paper{: .btn} -
Multi-class spectral clustering with overlaps for speaker diarization
Desh Raj, Zili Huang, Sanjeev Khudanpur
IEEE Spoken Language Technology (SLT) Workshop 2021
Paper{: .btn} Code{: .btn} Slides{: .btn} -
DOVER-Lap: A method for combining overlap-aware diarization outputs
Desh Raj, Paola Garcia, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur
IEEE Spoken Language Technology (SLT) Workshop 2021
Paper{: .btn} Code{: .btn} Slides{: .btn} -
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis
Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey
IEEE Spoken Language Technology (SLT) Workshop 2021
Paper{: .btn} Code{: .btn} Slides{: .btn} -
Sequential multi-frame neural beamforming for speech separation and enhancement
Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey
IEEE Spoken Language Technology (SLT) Workshop 2021
Paper{: .btn}
2020
-
Frustratingly easy noise-aware training of acoustic models
Desh Raj, Jesus Villalba, Daniel Povey, Sanjeev Khudanpur
ArXiv, 2020
Paper{: .btn} Code{: .btn} -
The JHU multi-microphone multi-speaker ASR system for the CHiME-6 challenge
Ashish Arora*, Desh Raj*, Aswin Shanmugam Subramanian*, Ke Li*, Bar Benyair, Matthew Maciejewski, Piotr Zelasko, Paola Garcia, Shinji Watanabe, Sanjeev Khudanpur.
The 6th CHiME Workshop (at ICASSP 2020).
Paper{: .btn} Video{: .btn} Slides{: .btn}
2019
-
Probing the infomation encoded in x-vectors
Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur.
IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2019.
Paper{: .btn} Code{: .btn} Poster{: .btn} -
Using ASR methods for OCR
Ashish Arora, Chun Chieh Chang, Babak Rekabdar, Daniel Povey, David Etter, Desh Raj, Hossein Hadian, Jan Trmal, Paola Garcia, Shinji Watanabe, Vimal Manohar, Yiwen Shao, Sanjeev Khudanpur.
International Conference on Document Analysis and Recognition (ICDAR) 2019.
Preprint{: .btn} Paper{: .btn} Code{: .btn} [Blog]({% post_url 2018-11-22-subword-segmentation %}){: .btn}
2018
- Uncertain fuzzy self-organization based clustering: interval type-2 approach to adaptive resonance theory
Shakaiba Majheed, Aditya Gupta, Desh Raj, Frank Chung-hoon Rhee.
Information Sciences, 2018.
Paper{: .btn}
2017
-
Learning local and global contexts using a convolutional recurrent neural network for relation classification in biomedical text
Desh Raj, Sunil Kumar Sahu, Ashish Anand.
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL) 2017.
Paper{: .btn} Poster{: .btn} Code{: .btn} -
Analysis of data generated from multidimensional type-1 and type-2 fuzzy membership functions
Desh Raj, Aditya Gupta, Bhuvnesh Garg, Kenil Tanna, Frank Chung-hoon Rhee.
IEEE Transactions on Fuzzy Systems, 2017.
Paper{: .btn}