Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming encoder #23

Open
onel opened this issue Jan 8, 2019 · 6 comments
Open

Streaming encoder #23

onel opened this issue Jan 8, 2019 · 6 comments

Comments

@onel
Copy link

onel commented Jan 8, 2019

First of all, thanks for this great library.

I have a question: is there a way to do encoding of a specific audio buffer and only get that back, and not the whole recording?
For example, sending a Float32Array, vmsg encodes it and then sends it back.
Right now I think during a recording, everything is held in memory and returned when calling vmsg_flush().
This would be useful for longer recordings where you want to encode something and maybe upload it and not keep it in memory.

I've tried to do something similar, by calling vmsg_init, vmsg_encode and then vmsg_flush, inside the data event listener for the worker. I don't think this is the right way to do it.

  case "data":

    if (!vmsg_init(msg.rate)) return postMessage({type: "error", data: "vmsg_init"});

    if (!vmsg_encode(msg.data)) return postMessage({type: "error", data: "vmsg_encode"});

    const blob = vmsg_flush();
    if (!blob) {
      return postMessage({type: "error", data: "vmsg_flush"});
    }

    postMessage({
      type: "blob",
      data: blob
    });
    
    break;

Is there a way to do that? A change would also need to be made inside vmsg.c, right?
Thanks

@Kagami
Copy link
Owner

Kagami commented Jan 8, 2019

Yes, it's possible, just need to make vmsg_encode C function return the number of bytes written, so you can send v->mp3+v->size-n .. v->mp3+v->size bytes via PostMessage to the main thread. At the end you also should fix the lame tag (lame_get_lametag_frame), need additional message for that.

I'm not sure if we want to use that method for normal recordings, because it would require to send every encoded chunk back to the main thread and copy it to the buffer, it might introduce additional delay. But should be ok to make it optional.

@Kagami Kagami changed the title Encode a specific audio buffer Streaming encoder Jan 8, 2019
@onel
Copy link
Author

onel commented Jan 9, 2019

Ok, I understand.
Don't have experience with c but maybe I'll try that in a fork.
Thank you so much for the details.

@onel
Copy link
Author

onel commented Mar 15, 2019

Hi there, I took a stab at making this work and I wanted to check with you if this is the right way to do it.
I haven't create a PR for this because I don't know if you would want to integrate it. But let me know if you would want that.
The idea is that on each buffer we would do vmsg_encode, vmsg_flush and then a new method vmsg_reset.
Inside the worker this would look like this:

  case "data":

    if (!vmsg_encode(msg.data)) return postMessage({type: "error", data: "vmsg_encode"});

    const blob = vmsg_flush();
    if (!blob) {
      return postMessage({type: "error", data: "vmsg_flush"});
    }

    postMessage({
      type: "blob",
      data: blob
    });

    FFI.vmsg_reset()
    
    break;

This will return the blob for that specific buffer each time.

The changes that I've made are:
For vmsg_encode the size is returned each time:

WASM_EXPORT
int vmsg_encode(vmsg *v, int nsamples) {
  if (nsamples > MAX_SAMPLES)
    return -1;

  if (fix_mp3_size(v) < 0)
    return -1;

  uint8_t *buf = v->mp3 + v->size;
  int n = lame_encode_buffer_ieee_float(v->gfp, v->pcm_l, NULL, nsamples, buf, BUF_SIZE);

  if (n < 0)
    return n;

  v->size += n;
  return v->size;
}

And the new method:

WASM_EXPORT
int vmsg_reset(vmsg *v, int rate) {
  if (v) {
    lame_close(v->gfp);
    v->size = 0;

    v->gfp = lame_init();
    if (!v->gfp) {
      vmsg_free(v);
      return -1;
    }
    
    lame_set_mode(v->gfp, MONO);
    lame_set_num_channels(v->gfp, 1);
    lame_set_in_samplerate(v->gfp, rate);
    lame_set_VBR(v->gfp, vbr_default);
    lame_set_VBR_quality(v->gfp, 5);

   if (lame_init_params(v->gfp) < 0) {
	 vmsg_free(v);
	 return -1;
   }
    
  }

  return 0;
}

This basically looks like init but without the memory allocation.
The problem I'm having is that the resulting mp3 blob is not actually usable. I think in vmsg_reset the encoder is not set up correctly.
My questions are:
Do you thing this is a good way to do buffer encoding?
And, what would you recommend we don in vmsg_reset?
Thanks

@flieks
Copy link

flieks commented Jan 16, 2020

@onel did you get it working ? i am also interested in this for live speech to text (on the server)

@stefan-reich
Copy link

stefan-reich commented Jul 30, 2021

Damn. I want this too. What if we fake it and just swap the encoder with a new one every few seconds? I'm fine with lots of relatively short mp3s.

@stefan-reich
Copy link

stefan-reich commented Jul 30, 2021

Ah I think I'll simply use MediaRecorder. It should record as .webm, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants