Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract the 3D pseudo-spectrum data from a DOA object #270

Open
fakufaku opened this issue Jul 5, 2022 · 13 comments
Open

extract the 3D pseudo-spectrum data from a DOA object #270

fakufaku opened this issue Jul 5, 2022 · 13 comments

Comments

@fakufaku
Copy link
Collaborator

fakufaku commented Jul 5, 2022

Hi FakuFaku,

Is there a way to extract the 3D pseudo-spectrum data from a DOA object in the case of a 3D DOA simulation as mentioned above? Or, stated another way, is there a way to find the pseudo-spectrum along two axis (in order to find the peaks for both azimuth and colatitude), similar to what was done in your example with NormMusic below?:

https://github.com/LCAV/pyroomacoustics/blob/master/notebooks/norm_music_demo.ipynb

Thanks.

Originally posted by @johntcuellar in #166 (comment)

@fakufaku
Copy link
Collaborator Author

fakufaku commented Jul 5, 2022

@johntcuellar yes, the DOA objects can be used for 3D localization. Internally, a 3D DOA object will have a GridSphere object, which is a grid of cartesian unit vectors pseudo-uniformly distributed on the sphere. Compared to the two axis grid spherical coordinate system that you describe, it has the advantage of not giving preference to some directions. The spherical coordinates grid has many more points around the south and north pole than around the equator.

DOA objects will be automatically 3D if you give microphone coordinates as a (3, n_mics) shape array. Then, the number of grid points can be specified by the n_grid argument. See the DOA doc for details.
After localization, the DOA objects will have the target coordinates in azimuth_recon and colatitude_recon variables.

@johntcuellar
Copy link

johntcuellar commented Jul 6, 2022 via email

@fakufaku
Copy link
Collaborator Author

fakufaku commented Jul 7, 2022

The Grid objects have actually some method to visualize the spatial spectrum.
For example, https://github.com/LCAV/pyroomacoustics/blob/master/pyroomacoustics/doa/grid.py#L333

If you want to have a plot in azimuth/colatitude, you usually need to resample the spherical grid on a regular grid in the azimuth/colatitude domain.

@johntcuellar
Copy link

johntcuellar commented Jul 7, 2022 via email

@johntcuellar
Copy link

johntcuellar commented Oct 11, 2022 via email

@fakufaku
Copy link
Collaborator Author

  1. When you say “resample”, what do you mean by that?

Many visualization method requires the input to be sampled on a uniform grid in the azimuth/elevation domain (grid1). However, the pyroomacoustics GridSphere uses a grid that is (kindof) uniform on a sphere in 3D (grid2). Resample means interpolate the value of the spatial spectrum on the points of grid1 from their values on grid2.

  1. How do I sample on a regular grid and by “regular grid”, do you mean a
    rectangular one?

There are some scipy functions to do that I think, e.g. interp2d.

By regular I mean uniformly spaced, etc. Usually this is rectangular indeed.

  1. I see that the spatial spectrum data is stored in different frequency
    bins (in Pssl, if I recall correctly). Why is this necessary? I assume it
    has to do with the subspace DOA methods, but no papers I have found go into
    detail about why they split the data into frequency bins.

Many DOA estimation methods work in the STFT domain because for narrow-band signal (i.e., a single frequency bin of the STFT), the delay operation can be represented by multiplication with a complex exponential (thanks to the convolution property of the Fourier transform). These explanations are usually omitted in papers. You might find it in textbooks, such as the Sound Capture and Processing by Ivan Tashev.

  1. Since Pssl stores the spatial spectrum data in different frequency bins,
    how would I conglomerate it so that I can look for peaks across the entire
    range of frequencies that I define? I would assume it is not as simple as
    summing the columns across all frequency bins (or is it)?

There are several methods, the simplest being indeed to average all the frequencies together. This is what is done in pyroomacoustics. Some methods, for example in NormMUSIC apply a different weighting before averaging.

@johntcuellar
Copy link

johntcuellar commented Oct 13, 2022 via email

@johntcuellar
Copy link

johntcuellar commented Nov 17, 2022 via email

@fakufaku
Copy link
Collaborator Author

Hi John,

The implementation of the regrid method is here. The output is three linear arrays corresponding to azimuth value, colatitude value, and Pssl value. The arrays are linearized versions of 2N x N arrays where N = int(np.sqrt(n_points / 2)) and n_points is the number of points in the original grid (to keep approximately the same number of points).

If the function does not exactly do what you want, I would suggest to just scoop up the code from the original method and tweak it to your convenience in a new function.

@johntcuellar
Copy link

johntcuellar commented Nov 18, 2022 via email

@fakufaku
Copy link
Collaborator Author

The resample method above does resample onto a uniform grid in the azimuth/colatitude domain. Can you please clarify what you want to do ?

@johntcuellar
Copy link

johntcuellar commented Nov 18, 2022 via email

@johntcuellar
Copy link

johntcuellar commented Nov 18, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants