Skip to content

How to remove text formatting from the segments? #418

@zaidbren

Description

@zaidbren

Hello everyone, I am trying to use the package for transcripting audio transcription based editing, and I need the segments shown with the timestamps, however, I am trying to access the result.segment text, and its giving me result with formatted like <starttime/></0.33/> etc.

Image
private func transcribeAudio(
        at audioURL: URL,
        whisperKit: WhisperKit,
        label: String,
        fileIndex: Int,
        totalFiles: Int
    ) async throws {
        
        let options = DecodingOptions(
            task: .transcribe,
            language: selectedLanguage.code,
            wordTimestamps: true,
            supressTokens: nil
        )
        
        let transcriptionResults: [TranscriptionResult] = try await whisperKit.transcribe(
            audioPath: audioURL.path,
            decodeOptions: options,
            callback: progressCallback
        )
        
        // Process and structure the results
        for result in transcriptionResults {
            for segment in result.segments {
                let words = (segment.words ?? []).map { wordTiming in
                    TranscriptionWord(
                        word: wordTiming.word,
                        start: Double(wordTiming.start),
                        end: Double(wordTiming.end)
                    )
                }
                
                let transcriptionSegment = TranscriptionSegment(
                    id: segment.id,
                    start: Double(segment.start),
                    end: Double(segment.end),
                    text: segment.text,
                    words: words
                )
                
                transcriptionSegments.append(transcriptionSegment)
            }
        }
        
        print("Processed \(transcriptionSegments.count) segments from \(label)")
    }

I only want to show pure text without any or any formattings

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions