Python multithreading - memory not released when ran using While statement

2024/9/16 23:25:27

I built a scraper (worker) launched XX times through multithreading (via Jupyter Notebook, python 2.7, anaconda). Script is of the following format, as described on python.org:

def worker():while True:item = q.get()do_work(item)q.task_done()q = Queue()
for i in range(num_worker_threads):t = Thread(target=worker)t.daemon = Truet.start()for item in source():q.put(item)q.join()       # block until all tasks are done

When I run the script as is, there are no issues. Memory is released after script finishes.

However, I want to run the said script 20 times (batching of sort), so I turn the script mentioned into a function, and run the function using code below:

def multithreaded_script():my script #code from abovex = 0
while x<20:x +=1multithredaded_script()

memory builds up with each iteration, and eventually the system start writing it to disk.

Is there a way to clear out the memory after each run?

I tried:

  1. setting all the variables to None
  2. setting sleep(30) at end of each iteration (in case it takes time for ram to release)

and nothing seems to help. Any ideas on what else I can try to get the memory to clear out after each run within the While statement? If not, is there a better way to execute my script XX times, that would not eat up the ram?

Thank you in advance.

Answer

TL;DR Solution: Make sure to end each function with return to ensure all local variables are destroyed from ram**

Per Pavel's suggestion, I used memory tracker (unfortunately suggested mem tracker did't work for me, so i used Pympler.)

Implementation was fairly simple:

from pympler.tracker import SummaryTracker
tracker = SummaryTracker()~~~~~~~~~YOUR CODEtracker.print_diff()

The tracker gave a nice output, which made it obvious that local variables generated by functions were not being destroyed.

Adding "return" at the end of every function fixed the issue.

Takeaway:
If you are writing a function that processes info/generates local variables, but doesn't pass local variables to anything else -> make sure to end the function with return anyways. This will prevent any issues that you may run into with memory leaks.

Additional notes on memory usage & BeautifulSoup: If you are using BeautifulSoup / BS4 with multithreading and multiple workers, and have limited amount of free ram, you can also use soup.decompose() to destroy soup variable right after you are done with it, instead of waiting for the function to return/code to stop running.

https://en.xdnf.cn/q/72867.html

Related Q&A

Delete files that are older than 7 days

I have seen some posts to delete all the files (not folders) in a specific folder, but I simply dont understand them.I need to use a UNC path and delete all the files that are older than 7 days.Mypath …

Doctests: How to suppress/ignore output?

The doctest of the following (nonsense) Python module fails:""" >>> L = [] >>> if True: ... append_to(L) # XXX >>> L [1] """def append_to(L):…

Matplotlib not showing xlabel in top two subplots

I have a function that Ive written to show a few graphs here:def plot_price_series(df, ts1, ts2):# price series line graphfig = plt.figure()ax1 = fig.add_subplot(221)ax1.plot(df.index, df[ts1], label=t…

SQLAlchemy NOT exists on subselect?

Im trying to replicate this raw sql into proper sqlalchemy implementation but after a lot of tries I cant find a proper way to do it:SELECT * FROM images i WHERE NOT EXISTS (SELECT image_idFROM events …

What is the correct way to obtain explanations for predictions using Shap?

Im new to using shap, so Im still trying to get my head around it. Basically, I have a simple sklearn.ensemble.RandomForestClassifier fit using model.fit(X_train,y_train), and so on. After training, Id…

value error when using numpy.savetxt

I want to save each numpy array (A,B, and C) as column in a text file, delimited by space:import numpy as npA = np.array([5,7,8912,44])B = np.array([5.7,7.45,8912.43,44.99])C = np.array([15.7,17.45,189…

How do I retrieve key from value in django choices field?

The sample code is below:REFUND_STATUS = ((S, SUCCESS),(F, FAIL) ) refund_status = models.CharField(max_length=3, choices=REFUND_STATUS)I know in the model I can retrieve the SUCCESS with method get_re…

Errno13, Permission denied when trying to read file

I have created a small python script. With that I am trying to read a txt file but my access is denied resolving to an no.13 error, here is my code:import time import osdestPath = C:\Users\PC\Desktop\N…

How to write large JSON data?

I have been trying to write large amount (>800mb) of data to JSON file; I did some fair amount of trial and error to get this code:def write_to_cube(data):with open(test.json) as file1:temp_data = j…

Using absolute_import and handling relative module name conflicts in python [duplicate]

This question already has answers here:How can I import from the standard library, when my project has a module with the same name? (How can I control where Python looks for modules?)(7 answers)Close…