Python multithreading - memory not released when ran using While statement

2024/9/16 23:25:27

I built a scraper (worker) launched XX times through multithreading (via Jupyter Notebook, python 2.7, anaconda). Script is of the following format, as described on

def worker():while True:item = q.get()do_work(item)q.task_done()q = Queue()
for i in range(num_worker_threads):t = Thread(target=worker)t.daemon = Truet.start()for item in source():q.put(item)q.join()       # block until all tasks are done

When I run the script as is, there are no issues. Memory is released after script finishes.

However, I want to run the said script 20 times (batching of sort), so I turn the script mentioned into a function, and run the function using code below:

def multithreaded_script():my script #code from abovex = 0
while x<20:x +=1multithredaded_script()

memory builds up with each iteration, and eventually the system start writing it to disk.

Is there a way to clear out the memory after each run?

I tried:

  1. setting all the variables to None
  2. setting sleep(30) at end of each iteration (in case it takes time for ram to release)

and nothing seems to help. Any ideas on what else I can try to get the memory to clear out after each run within the While statement? If not, is there a better way to execute my script XX times, that would not eat up the ram?

Thank you in advance.


TL;DR Solution: Make sure to end each function with return to ensure all local variables are destroyed from ram**

Per Pavel's suggestion, I used memory tracker (unfortunately suggested mem tracker did't work for me, so i used Pympler.)

Implementation was fairly simple:

from pympler.tracker import SummaryTracker
tracker = SummaryTracker()~~~~~~~~~YOUR CODEtracker.print_diff()

The tracker gave a nice output, which made it obvious that local variables generated by functions were not being destroyed.

Adding "return" at the end of every function fixed the issue.

If you are writing a function that processes info/generates local variables, but doesn't pass local variables to anything else -> make sure to end the function with return anyways. This will prevent any issues that you may run into with memory leaks.

Additional notes on memory usage & BeautifulSoup: If you are using BeautifulSoup / BS4 with multithreading and multiple workers, and have limited amount of free ram, you can also use soup.decompose() to destroy soup variable right after you are done with it, instead of waiting for the function to return/code to stop running.

