-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extract the 3D pseudo-spectrum data from a DOA object #270
Comments
@johntcuellar yes, the DOA objects can be used for 3D localization. Internally, a 3D DOA object will have a DOA objects will be automatically 3D if you give microphone coordinates as a |
Hi FakuFakue,
Thanks for clarifying. I've come across the GridSphere before in the pyroom
documentation.
So if I understand what you are saying correctly, I can use GridSphere to
extract the spatial spectrum data so that I can plot a similar plot as
below?:
[image: image.png]
ref:
https://www.mathworks.com/help/phased/ug/direction-of-arrival-estimation-with-beamscan-mvdr-and-music.html
Also, under the DOA object, what is Pssl?
Thank you.
Best,
John
…On Tue, Jul 5, 2022 at 4:29 AM Robin Scheibler ***@***.***> wrote:
@johntcuellar <https://github.com/johntcuellar> yes, the DOA objects can
be used for 3D localization. Internally, a 3D DOA object will have a
GridSphere
<https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.doa.grid.html?highlight=gridsphere#pyroomacoustics.doa.grid.GridSphere>
object, which is a grid of cartesian unit vectors pseudo-uniformly
distributed on the sphere. Compared to the two axis grid spherical
coordinate system that you describe, it has the advantage of not giving
preference to some directions. The spherical coordinates grid has many more
points around the south and north pole than around the equator.
DOA objects will be automatically 3D if you give microphone coordinates as
a (3, n_mics) shape array. Then, the number of grid points can be
specified by the n_grid argument. See the DOA doc
<https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.doa.doa.html>
for details.
After localization, the DOA objects will have the target coordinates in
azimuth_recon and colatitude_recon variables.
—
Reply to this email directly, view it on GitHub
<#270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AP5W7TIJHG75GHXOPUDVDC3VSQMBZANCNFSM52UQIECA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
John Cuellar
|
The Grid objects have actually some method to visualize the spatial spectrum. If you want to have a plot in azimuth/colatitude, you usually need to resample the spherical grid on a regular grid in the azimuth/colatitude domain. |
I see
Thank you very much!
John
On Thu, Jul 7, 2022 at 4:55 AM Robin Scheibler ***@***.***> wrote:
The Grid objects have actually some method to visualize the spatial
spectrum.
For example,
https://github.com/LCAV/pyroomacoustics/blob/master/pyroomacoustics/doa/grid.py#L333
If you want to have a plot in azimuth/colatitude, you usually need to
resample the spherical grid on a regular grid in the azimuth/colatitude
domain.
—
Reply to this email directly, view it on GitHub
<#270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AP5W7TNQT7WLCWFDX2IZYFLVS3ASVANCNFSM52UQIECA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
John Cuellar
|
Hi Robin,
A few questions:
1. When you say “resample”, what do you mean by that?
2. How do I sample on a regular grid and by “regular grid”, do you mean a
rectangular one?
3. I see that the spatial spectrum data is stored in different frequency
bins (in Pssl, if I recall correctly). Why is this necessary? I assume it
has to do with the subspace DOA methods, but no papers I have found go into
detail about why they split the data into frequency bins.
4. Since Pssl stores the spatial spectrum data in different frequency bins,
how would I conglomerate it so that I can look for peaks across the entire
range of frequencies that I define? I would assume it is not as simple as
summing the columns across all frequency bins (or is it)?
Thank you.
Best,
John Cuellar
On Thu, Jul 7, 2022 at 11:23 AM John Cuellar ***@***.***>
wrote:
I see
Thank you very much!
John
On Thu, Jul 7, 2022 at 4:55 AM Robin Scheibler ***@***.***>
wrote:
> The Grid objects have actually some method to visualize the spatial
> spectrum.
> For example,
> https://github.com/LCAV/pyroomacoustics/blob/master/pyroomacoustics/doa/grid.py#L333
>
> If you want to have a plot in azimuth/colatitude, you usually need to
> resample the spherical grid on a regular grid in the azimuth/colatitude
> domain.
>
> —
> Reply to this email directly, view it on GitHub
> <#270 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AP5W7TNQT7WLCWFDX2IZYFLVS3ASVANCNFSM52UQIECA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
--
John Cuellar
--
John Cuellar
|
Many visualization method requires the input to be sampled on a uniform grid in the azimuth/elevation domain (grid1). However, the pyroomacoustics GridSphere uses a grid that is (kindof) uniform on a sphere in 3D (grid2). Resample means interpolate the value of the spatial spectrum on the points of grid1 from their values on grid2.
There are some scipy functions to do that I think, e.g. interp2d. By regular I mean uniformly spaced, etc. Usually this is rectangular indeed.
Many DOA estimation methods work in the STFT domain because for narrow-band signal (i.e., a single frequency bin of the STFT), the delay operation can be represented by multiplication with a complex exponential (thanks to the convolution property of the Fourier transform). These explanations are usually omitted in papers. You might find it in textbooks, such as the Sound Capture and Processing by Ivan Tashev.
There are several methods, the simplest being indeed to average all the frequencies together. This is what is done in pyroomacoustics. Some methods, for example in |
Hi Robin,
Thanks for the clarifications. That definitely helps! If I have any other
questions, I will be sure to ask.
Best,
John Cuellar
On Tue, Oct 11, 2022 at 6:30 PM Robin Scheibler ***@***.***> wrote:
1. When you say “resample”, what do you mean by that?
Many visualization method requires the input to be sampled on a uniform
grid in the azimuth/elevation domain (grid1). However, the pyroomacoustics
GridSphere uses a grid that is (kindof) uniform on a sphere in 3D (grid2).
Resample means interpolate the value of the spatial spectrum on the points
of grid1 from their values on grid2.
1. How do I sample on a regular grid and by “regular grid”, do you
mean a
rectangular one?
There are some scipy functions to do that I think, e.g. interp2d
<https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.html>
.
By regular I mean uniformly spaced, etc. Usually this is rectangular
indeed.
1. I see that the spatial spectrum data is stored in different
frequency
bins (in Pssl, if I recall correctly). Why is this necessary? I assume
it
has to do with the subspace DOA methods, but no papers I have found go
into
detail about why they split the data into frequency bins.
Many DOA estimation methods work in the STFT domain because for
narrow-band signal (i.e., a single frequency bin of the STFT), the delay
operation can be represented by multiplication with a complex exponential
(thanks to the convolution property of the Fourier transform). These
explanations are usually omitted in papers. You might find it in textbooks,
such as the Sound Capture and Processing
<https://www.amazon.co.jp/-/en/Ivan-Jelev-Tashev/dp/0470319836> by Ivan
Tashev.
1. Since Pssl stores the spatial spectrum data in different frequency
bins,
how would I conglomerate it so that I can look for peaks across the
entire
range of frequencies that I define? I would assume it is not as simple
as
summing the columns across all frequency bins (or is it)?
There are several methods, the simplest being indeed to average all the
frequencies together. This is what is done in pyroomacoustics. Some
methods, for example in NormMUSIC apply a different weighting before
averaging.
—
Reply to this email directly, view it on GitHub
<#270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AP5W7TNNMFB6PWVSU35WKSDWCYIDJANCNFSM52UQIECA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
John Cuellar
|
Hi Robin,
I am having trouble re-gridding the spherical spatial spectrum in the
azimuth/Colatitude domain. Can you please assist? I was able to
successfully implement regrid, but not in the way I had hoped. The Variable
'Pssl_Grid' is the re-gridded pssl, but it is not a tuple of 3 180x90
arrays. Regrid is towards the bottom after the DoA simulation.
Thank you.
Best,
John
#from __future__ import print_function
import numpy as np
from scipy.signal import fftconvolve
import matplotlib.pyplot as plt
import time
import pyroomacoustics as pra
from pyroomacoustics.doa import circ_dist
from scipy.io import wavfile
from sympy import Plane, Point3D
from scipy.signal import argrelextrema
######
# We define a meaningful distance measure on the circle
# Location of original source
azimuth_1 = 19.5 / 180.0 * np.pi # With Respect to x axis
colatitude_1 = 38.0 / 180.0 *np.pi
distance = 3.0 # 3 meters
#######################
# algorithms parameters
SNR = 0.0 # signal-to-noise ratio
c = 343.0 # speed of sound
fs = 16000 # sampling frequency - Comment out when importing .wav file
nfft = 256 # FFT size
fmin = 500 # Min Frequency Hz
fmax = 3500 # Max Frequency Hz
freq_bins = np.arange(5, 60) # FFT bins to use for estimation
num_src = 3 # number of sources
max_order = 3; # Number of reflections (0 = anechoic room)
dim = 3 # Dimension of room - '2' for 2D or '3' for 3D
# Make Nearly Equidistant Points on a Spherical Grid for more than one
Sound Source searching
n_grid = 4000
# compute the noise variance
sigma2 = 10 ** (-SNR / 10) / (4.0 * np.pi * distance) ** 2
mic_to_mic_dist = 0.1 # distance from center of mic array to mic
z_offset_factor = 0.9 # Mic Z offset as a factor of z = room_dim[2]
# Define Room Materials
m = pra.make_materials(
ceiling="hard_surface",
floor="linoleum_on_concrete",
east="brickwork",
west="brickwork",
north="brickwork",
south="brickwork",)
# If importing a .wav file as a signal source, use below template. .wav
file must
# be in the same folder location as the program
# ** NOTE: fs is extracted from the .wav file here
#fs, source_signal = wavfile.read('guitar_44k.wav')
#aroom.add_source([1.5, 1.7, 1.6], signal=audio_anechoic)
# Create an anechoic room
room_dim = np.r_[10.0, 10.0, 10.0] # Original Line
aroom = pra.ShoeBox(room_dim, fs=fs, materials = m, max_order = max_order,
sigma2_awgn=sigma2, air_absorption = True, ray_tracing = False)
#north_wall = aroom.get_wall_by_name('north') # Giving an instance of a
wall by calling its name
#aroom.set_ray_tracing(receiver_radius=0.5)# Activate Ray Tracing
# add the source(s)
# Source 1
source_location = room_dim / 2 + distance *
np.r_[np.cos(azimuth_1)*np.sin(colatitude_1),
np.sin(azimuth_1)*np.sin(colatitude_1), np.cos(colatitude_1)]
source_signal = np.random.randn((nfft // 2 + 1) * nfft)
aroom.add_source(source_location, signal = source_signal)
#Circular array of 12 mics
#R = np.c_[]
R = np.c_[
[room_dim[0]/2 + mic_to_mic_dist*np.cos(0 * np.pi / 6), room_dim[1]/2 +
mic_to_mic_dist*np.sin(0 * np.pi / 6), room_dim[2]/2], # Mic 1
[room_dim[0]/2 + mic_to_mic_dist*np.cos(1 * np.pi / 6), room_dim[1]/2 +
mic_to_mic_dist*np.sin(1 * np.pi / 6), room_dim[2]/2], # Mic 2
[room_dim[0]/2 + mic_to_mic_dist*np.cos(2 * np.pi / 6), room_dim[1]/2 +
mic_to_mic_dist*np.sin(2 * np.pi / 6), room_dim[2]/2], # Mic 3
[room_dim[0]/2 + mic_to_mic_dist*np.cos(3 * np.pi / 6), room_dim[1]/2 +
mic_to_mic_dist*np.sin(3 * np.pi / 6), room_dim[2]/2], # Mic 4
[room_dim[0]/2 + mic_to_mic_dist*np.cos(4 * np.pi / 6), room_dim[1]/2 +
mic_to_mic_dist*np.sin(4 * np.pi / 6), room_dim[2]/2], # Mic 5
[room_dim[0]/2 + mic_to_mic_dist*np.cos(5 * np.pi / 6), room_dim[1]/2 +
mic_to_mic_dist*np.sin(5 * np.pi / 6), room_dim[2]/2], # Mic 6
[room_dim[0]/2 + mic_to_mic_dist*np.cos(6 * np.pi / 6), room_dim[1]/2 +
mic_to_mic_dist*np.sin(6 * np.pi / 6), room_dim[2]/2], # Mic 7
[room_dim[0]/2 + mic_to_mic_dist*np.cos(7 * np.pi / 6), room_dim[1]/2 +
mic_to_mic_dist*np.sin(7 * np.pi / 6), room_dim[2]/2], # Mic 8
[room_dim[0]/2 + mic_to_mic_dist*np.cos(8 * np.pi / 6), room_dim[1]/2 +
mic_to_mic_dist*np.sin(8 * np.pi / 6), room_dim[2]/2], # Mic 9
[room_dim[0]/2 + mic_to_mic_dist*np.cos(9 * np.pi / 6), room_dim[1]/2 +
mic_to_mic_dist*np.sin(9 * np.pi / 6), room_dim[2]/2], # Mic 10
[room_dim[0]/2 + mic_to_mic_dist*np.cos(10 * np.pi / 6), room_dim[1]/2
+ mic_to_mic_dist*np.sin(10 * np.pi / 6), room_dim[2]/2], # Mic 11
[room_dim[0]/2 + mic_to_mic_dist*np.cos(11 * np.pi / 6), room_dim[1]/2
+ mic_to_mic_dist*np.sin(11 * np.pi / 6), room_dim[2]/2]] # Mic 12
aroom.add_microphone_array(pra.MicrophoneArray(R, fs=aroom.fs))
# Use the following function to compute the rir using either 'ism' method,
'rt' method, or 'hybrid' method
#chrono = time.time()
aroom.compute_rir()
#print("Done in", time.time() - chrono, "seconds.")
#print("RT60:", aroom.measure_rt60()[0, 0, 0])
#aroom.plot_rir()
#plt.show() # Plot RiR
# run the simulation
aroom.simulate()
# Record the audio to a wav file at the microphone
#audio_reverb = aroom.mic_array.to_wav("aaa.wav", norm=True,
bitdepth=np.int16)
################################
# Compute the STFT frames needed
X = np.array(
[
pra.transform.stft.analysis(signal, nfft, nfft // 2).T
for signal in aroom.mic_array.signals
]
)
##############################################
# MUSIC or NormMUSIC Algorithm
doa = pra.doa.algorithms['NormMUSIC'](R, fs, nfft, num_src = num_src, c=c,
max_four=4, dim = dim)
#doa = pra.doa.algorithms['NormMUSIC'](R, fs, nfft, num_src = num_src, c=c,
max_four=4, n_grid = n_grid, dim = dim)
# this call here perform localization on the frames in
doa.locate_sources(X, num_src = num_src, freq_range = [fmin, fmax],
freq_bins=freq_bins)
# Regrid the SPherical Grid to a Rectangular one and plot it
Pssl_Grid = doa.grid.regrid()
#Pssl_Grid = doa.Pssl.regrid(n_grid)
fig = plt.figure(figsize=(14,10))
plt.plot() # Eventually plot Spatial Spectrum
plt.xlabel("angle [°]")
plt.title("NormMUSIC Spatial pseudo spectra in one plot", fontsize=15)
On Wed, Oct 12, 2022 at 10:09 PM John Cuellar ***@***.***>
wrote:
… Hi Robin,
Thanks for the clarifications. That definitely helps! If I have any other
questions, I will be sure to ask.
Best,
John Cuellar
On Tue, Oct 11, 2022 at 6:30 PM Robin Scheibler ***@***.***>
wrote:
>
> 1. When you say “resample”, what do you mean by that?
>
> Many visualization method requires the input to be sampled on a uniform
> grid in the azimuth/elevation domain (grid1). However, the pyroomacoustics
> GridSphere uses a grid that is (kindof) uniform on a sphere in 3D (grid2).
> Resample means interpolate the value of the spatial spectrum on the points
> of grid1 from their values on grid2.
>
>
> 1. How do I sample on a regular grid and by “regular grid”, do you
> mean a
> rectangular one?
>
> There are some scipy functions to do that I think, e.g. interp2d
> <https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.html>
> .
>
> By regular I mean uniformly spaced, etc. Usually this is rectangular
> indeed.
>
>
> 1. I see that the spatial spectrum data is stored in different
> frequency
> bins (in Pssl, if I recall correctly). Why is this necessary? I
> assume it
> has to do with the subspace DOA methods, but no papers I have found
> go into
> detail about why they split the data into frequency bins.
>
> Many DOA estimation methods work in the STFT domain because for
> narrow-band signal (i.e., a single frequency bin of the STFT), the delay
> operation can be represented by multiplication with a complex exponential
> (thanks to the convolution property of the Fourier transform). These
> explanations are usually omitted in papers. You might find it in textbooks,
> such as the Sound Capture and Processing
> <https://www.amazon.co.jp/-/en/Ivan-Jelev-Tashev/dp/0470319836> by Ivan
> Tashev.
>
>
> 1. Since Pssl stores the spatial spectrum data in different frequency
> bins,
> how would I conglomerate it so that I can look for peaks across the
> entire
> range of frequencies that I define? I would assume it is not as
> simple as
> summing the columns across all frequency bins (or is it)?
>
> There are several methods, the simplest being indeed to average all the
> frequencies together. This is what is done in pyroomacoustics. Some
> methods, for example in NormMUSIC apply a different weighting before
> averaging.
>
> —
> Reply to this email directly, view it on GitHub
> <#270 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AP5W7TNNMFB6PWVSU35WKSDWCYIDJANCNFSM52UQIECA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
--
John Cuellar
--
John Cuellar
|
Hi John, The implementation of the If the function does not exactly do what you want, I would suggest to just scoop up the code from the original method and tweak it to your convenience in a new function. |
Hi Robin,
Thank you. Based on previous emails, I was under the impression that there
was a way to resample or regrid in the colatitude/azimuth domain directly
using regrid().
I’ve spent a few hours trying to do it in my own to no avail. Since I was
having trouble, I was hoping you might be able to help (admittedly, I have
limited experience in Python, yet I find throwing myself in this project to
be a good teaching experience. As a result, I find myself asking a lot of
questions).
Best,
John Cuellar
On Thu, Nov 17, 2022 at 7:25 AM Robin Scheibler ***@***.***> wrote:
Hi John,
The implementation of the regrid method is here
<https://github.com/LCAV/pyroomacoustics/blob/master/pyroomacoustics/doa/grid.py#L306>.
The output is three linear arrays corresponding to azimuth value,
colatitude value, and Pssl value. The arrays are linearized versions of 2N
x N arrays where N = int(np.sqrt(n_points / 2)) and n_points is the
number of points in the original grid (to keep approximately the same
number of points).
If the function does not exactly do what you want, I would suggest to just
scoop up the code from the original method and tweak it to your convenience
in a new function.
—
Reply to this email directly, view it on GitHub
<#270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AP5W7TJFZ7LAJSSRHRLRIFLWIZFANANCNFSM52UQIECA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
John Cuellar
|
The |
Hi Robin,
Absolutely. I want to create a plot of the spatial spectrum data like the
one below.
[image: image.png]
The plot is really just a validation that I have extracted the data I want
to extract. I then want to feed this data into a neural network for sound
source localization as part of a project I am working on.
Going back to your previous email, you say that the regrid() output arrays
are linear arrays corresponding to the colatitude, azimuth, and Pssl value.
Does each column correspond to each sound source? I was curious as to why
there are specifically three columns in each array.
Thank you.
Best,
John
…On Thu, Nov 17, 2022 at 7:17 PM Robin Scheibler ***@***.***> wrote:
The resample method above does resample onto a uniform grid in the
azimuth/colatitude domain. Can you please clarify what you want to do ?
—
Reply to this email directly, view it on GitHub
<#270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AP5W7TMZZH6OIUL5QON4FI3WI3YL5ANCNFSM52UQIECA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi, Robin,
Disregard that last paragraph, please.
I was able to get the plot I wanted, but I still have some questions:
[image: image.png]
1. When I specify three sources, two of the recovered angles almost overlap
(see below). Why would that happen?
[image: image.png]
2. I understand that the three output arrays from regrid() correspond to
azimuth, colatitude, and the pssl values, but why would the first two
arrays not just be linear arrays of the allowable ranges of the azithm (360
deg) and colatitude (90 deg)?
3. Since the azimuth and colatitude arrays are 180 x 90, what azimuth angle
would, for example, the value in the 100th row and the 10th column
correspond to?
4. If the original spherical plot had a azimuth and colatitude range of 360
degrees and 90 degrees, respectively, (a hemi-sphere), why does the regrid
effectively half that to a quarter of a sphere (180 x 90)?
5. Is there a way to correlate a point on the original sphere to a
particular azimuth/colatitude value? What I mean is, the original Pssl was
a linear array. How would I correlate any value in that array to azimuth
and colatitude angle without regridding to a rectangular mesh?
Thank you.
Best,
John
On Thu, Nov 17, 2022 at 9:00 PM John Cuellar ***@***.***>
wrote:
…
Hi Robin,
Absolutely. I want to create a plot of the spatial spectrum data like the
one below.
[image: image.png]
The plot is really just a validation that I have extracted the data I want
to extract. I then want to feed this data into a neural network for sound
source localization as part of a project I am working on.
Going back to your previous email, you say that the regrid() output arrays
are linear arrays corresponding to the colatitude, azimuth, and Pssl value.
Does each column correspond to each sound source? I was curious as to why
there are specifically three columns in each array.
Thank you.
Best,
John
On Thu, Nov 17, 2022 at 7:17 PM Robin Scheibler ***@***.***>
wrote:
> The resample method above does resample onto a uniform grid in the
> azimuth/colatitude domain. Can you please clarify what you want to do ?
>
> —
> Reply to this email directly, view it on GitHub
> <#270 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AP5W7TMZZH6OIUL5QON4FI3WI3YL5ANCNFSM52UQIECA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Hi FakuFaku,
Is there a way to extract the 3D pseudo-spectrum data from a DOA object in the case of a 3D DOA simulation as mentioned above? Or, stated another way, is there a way to find the pseudo-spectrum along two axis (in order to find the peaks for both azimuth and colatitude), similar to what was done in your example with NormMusic below?:
https://github.com/LCAV/pyroomacoustics/blob/master/notebooks/norm_music_demo.ipynb
Thanks.
Originally posted by @johntcuellar in #166 (comment)
The text was updated successfully, but these errors were encountered: