I think Librosa.effect.split has some problem?

2024/10/10 22:19:08

firstly, this function is to remove silence of an audio. here is the official description:

https://librosa.github.io/librosa/generated/librosa.effects.split.html

librosa.effects.split(y, top_db=10, *kargs)

Split an audio signal into non-silent intervals.

top_db:number > 0 The threshold (in decibels) below reference to consider as silence

return: intervals:np.ndarray, shape=(m, 2) intervals[i] == (start_i, end_i) are the start and end time (in samples) of non-silent interval i.

so this is quite straightforward, for any sound which is lower than 10dB, treat it as silence and remove from the audio. It will return me a list of intervals which are non-silent segments in the audio.

So I did a very simple example and the result confuses me: the audio i load here is a 3 second humand talking, very normal takling.

y, sr = librosa.load(file_list[0]) #load the data
print(y.shape) -> (87495,)intervals = librosa.effects.split(y, top_db=100)
intervals -> array([[0, 87495]])#if i change 100 to 10
intervals = librosa.effects.split(y, top_db=10)
intervals -> array([[19456, 23040],[27136, 31232],[55296, 58880],[64512, 67072]])

how is this possible...

I tell librosa, ok, for any sound which is below 100dB, treat it as silence. under this setting, the whole audio should be treated as silence, and based on the document, it should give me array[[0,0]] something...because after remove silence, there is nothing left...

But it seems librosa returns me the silence part instead of the non-silence part.

Answer

librosa.effects.split() It says in the documentation that it returns a numpy array that contains the intervals which contain non silent audio. These intervals of course depend on the value you assign to the parameter top_db. It does not return any audio, just the start and end points of the non-silent slices of your waveform

In your case, even if you set top_db = 100, it does not treat the entire audio as silence since they state in the documentation that they use The reference power. By default, it uses **np.max** and compares to the peak power in the signal. So setting your top_db higher than the maximal value that exists in your audio will actually result in top_db not having any effect. Here's an example:

import librosa
import numpy as np
import matplotlib.pyplot as plt# create a hypothetical waveform with 1000 noisy samples and 1000 silent samples
nonsilent = np.random.randint(11, 100, 1000) * 100
silent = np.zeros(1000)
wave = np.concatenate((nonsilent, silent))
# look at it
print(wave.shape)
plt.plot(wave)
plt.show()# get the noisy interval
non_silent_interval = librosa.effects.split(wave, top_db=0.1, hop_length=1000)
print(non_silent_interval)# plot only the noisy chunk of the waveform
plt.plot(wave[non_silent_interval[0][0]:non_silent_interval[0][1]])
plt.show()# now set top_db higher than anything in your audio
non_silent_interval = librosa.effects.split(wave, top_db=1000, hop_length=1000)
print(non_silent_interval)# and you'll get the entire audio again
plt.plot(wave[non_silent_interval[0][0]:non_silent_interval[0][1]])
plt.show()

You can see that non silent audio is from 0 to 1000 and the silent audio is from 1000 to 2000 samples: enter image description here

Here it only gives us the noisy chunk of the wave we created: enter image description here

And here is with top_db set at a 1000: enter image description here

That means librosa did everything that it promised to do in the documentation. Hope this helps.

https://en.xdnf.cn/q/69842.html

Related Q&A

Replace None in list with leftmost non none value

Givena = [None,1,2,3,None,4,None,None]Id likea = [None,1,2,3,3,4,4,4]Currently I have brute forced it with:def replaceNoneWithLeftmost(val):last = Noneret = []for x in val:if x is not None:ret.append(x…

Generating lists/reports with in-line summaries in Django

I am trying to write a view that will generate a report which displays all Items within my Inventory system, and provide summaries at a certain point. This report is purely just an HTML template by the…

How to test if a view is decorated with login_required (Django)

Im doing some (isolated) unit test for a view which is decorated with "login_required". Example:@login_required def my_view(request):return HttpResponse(test)Is it possible to test that the &…

Filter Nested field in Flask Marshmallow

I want to filter the nested field with is_active column as True in Marshmallow 3 Consider following scenario I have 3 tablesusers (id, name) organizations (id, name) organization_user(id, organization_…

Copy signature, forward all arguments from wrapper function

I have two functions in a class, plot() and show(). show(), as convenience method, does nothing else than to add two lines to the code of plot() likedef plot(self,show_this=True,show_that=True,color=k,…

How to fit a line through a 3D pointcloud?

I have a cable I am dropping from moving vehicle onto the ground. Using a camera system I estimate the location where the rope touches the ground in realtime. Movement of the vehicle and inaccuracy in …

Websockets with Django Channels on Heroku

I am trying to deploy my app to heroku. The app has a simple chatting system that uses Websockets and django channels. When I test my app using python manage.py runserver the app behaves just as intend…

How can I get the name/file of the script from sitecustomize.py?

When I run any Python script, I would like to see the scripts filename appear in the Windows command line windows titlebar. For example, if I run a script called "mytest.py", I want to see &q…

Sending Godaddy email via Django using python

I have these settings EMAIL_HOST = smtpout.secureserver.net EMAIL_HOST_USER = [email protected] EMAIL_HOST_PASSWORD = password DEFAULT_FROM_EMAIL = [email protected] SERVER_EMAIL = [email protected] EM…

python math, numpy modules different results?

I get slightly different results calculating the cosine of a value. How can I check that this difference is within machine precision?import math math.cos(60.0/180.0*math.pi) -> 0.5000000000000001im…