How to return word level timestamp granularity from speech-to-text (whisper)?

Hello,

I want to transcribe an audio and return word granularity timestamp. This is supported by models like Openai Whisper-1.
However, even though I passed the correct parameters to the provider, OpenRouter only return the text.

```typescript
async function transcribeAudio(audioBase64: string) {
    const response = await sttCreateTranscription(openRouter, {
        sttRequest: {
            inputAudio: {
                data: audioBase64,
                format: 'mp3',
            },
            model: 'openai/whisper-1',
            provider: {
                options: {
                    openai: {
                        timestamp_granularities: ["word"],
                        response_format: "verbose_json"
                    }
                }
            }
        }
    })
    if (!response.ok) {
        throw new Error(`Failed to transcribe audio: ${response.error.message}`);
    }

    return response.value;
}
``` 
I was expecting this to return an additional key "words" as the Openai SDK.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to return word level timestamp granularity from speech-to-text (whisper)? #440

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How to return word level timestamp granularity from speech-to-text (whisper)? #440

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions