Quicker to os.walk or glob?

2024/11/19 19:37:31

I'm messing around with file lookups in python on a large hard disk. I've been looking at os.walk and glob. I usually use os.walk as I find it much neater and seems to be quicker (for usual size directories).

Has anyone got any experience with them both and could say which is more efficient? As I say, glob seems to be slower, but you can use wildcards etc, were as with walk, you have to filter results. Here is an example of looking up core dumps.

core = re.compile(r"core\.\d*")
for root, dirs, files in os.walk("/path/to/dir/")for file in files:if core.search(file):path = os.path.join(root,file)print "Deleting: " + pathos.remove(path)

Or

for file in iglob("/path/to/dir/core.*")print "Deleting: " + fileos.remove(file)
Answer

I made a research on a small cache of web pages in 1000 dirs. The task was to count a total number of files in dirs. The output is:

os.listdir: 0.7268s, 1326786 files found
os.walk: 3.6592s, 1326787 files found
glob.glob: 2.0133s, 1326786 files found

As you see, os.listdir is quickest of three. And glog.glob is still quicker than os.walk for this task.

The source:

import os, time, globn, t = 0, time.time()
for i in range(1000):n += len(os.listdir("./%d" % i))
t = time.time() - t
print "os.listdir: %.4fs, %d files found" % (t, n)n, t = 0, time.time()
for root, dirs, files in os.walk("./"):for file in files:n += 1
t = time.time() - t
print "os.walk: %.4fs, %d files found" % (t, n)n, t = 0, time.time()
for i in range(1000):n += len(glob.glob("./%d/*" % i))
t = time.time() - t
print "glob.glob: %.4fs, %d files found" % (t, n)
https://en.xdnf.cn/q/26406.html

Related Q&A

Getting PyCharm to recognize python on the windows linux subsystem (bash on windows)

While running Linux versions of python, pip etc. "natively" on windows is amazing, Id like to do so using a proper IDE. Since SSHD compatibility has not been implemented yet, Im trying get Py…

Whats the difference between nan, NaN and NAN

In numpy there are nan, NaN and NAN. Whats the sense of having all three, do they differ or any of these can be used interchangeably?

Python requests: URL base in Session

When using a Session, it seems you need to provide the full URL each time, e.g.session = requests.Session() session.get(http://myserver/getstuff) session.get(http://myserver/getstuff2)This gets a littl…

size of NumPy array

Is there an equivalent to the MATLAB size() command in Numpy? In MATLAB, >>> a = zeros(2,5)0 0 0 0 00 0 0 0 0 >>> size(a)2 5In Python, >>> a = zeros((2,5)) >>> a ar…

Feature Importance Chart in neural network using Keras in Python

I am using python(3.6) anaconda (64 bit) spyder (3.1.2). I already set a neural network model using keras (2.0.6) for a regression problem(one response, 10 variables). I was wondering how can I generat…

numpy.max or max ? Which one is faster?

In python, which one is faster ? numpy.max(), numpy.min()ormax(), min()My list/array length varies from 2 to 600. Which one should I use to save some run time ?

Nested Json to pandas DataFrame with specific format

I need to format the contents of a Json file in a certain format in a pandas DataFrame so that I can run pandassql to transform the data and run it through a scoring model. file = C:\scoring_model\json…

Iterating over dictionary items(), values(), keys() in Python 3

If I understand correctly, in Python 2, iter(d.keys()) was the same as d.iterkeys(). But now, d.keys() is a view, which is in between the list and the iterator. Whats the difference between a view and …

Is there a method that tells my program to quit?

For the "q" (quit) option in my program menu, I have the following code:elif choice == "q":print()That worked all right until I put it in an infinite loop, which kept printing blank…

Hiding Axis Labels

Im trying to hide the axis labels on the first subplot at 211. Id like to label the figure, not just a subplot (reference: "Isub Event Characteristics"). How can I control font properties lik…