Currently, mergeAudio and concatAudio takes the maximum number of channels as the number of channels in the output buffer but for each input buffer it only gets its own channels and apply them to the corresponding channel in the output buffer. Its somewhat like this:
in: [[M], [L, R]]
out: [M+L, R]
The first buffer (which was mono) isn't reproduced in the right channel in the output buffer.
Mono buffers should be reproduced in all output channels.