Resize image faster in OpenCV Python

2024/10/4 3:22:42

I have a lot of image files in a folder (5M+). These images are of different sizes. I want to resize these images to 128x128.

I used the following function in a loop to resize in Python using OpenCV

def read_image(img_path):# print(img_path)img = cv2.imread(img_path)img = cv2.resize(img, (128, 128))return imgfor file in tqdm(glob.glob('train-images//*.jpg')):img = read_image(file)img = cv2.imwrite(file, img)

But it will take more than 7 hours to complete. I was wondering whether there are any method to speed up this process.

Can I implement parallel processing to do this efficiently with dask or something.? If so how is it possible.?

Answer

If you are absolutely intent on doing this in Python, then please just disregard my answer. If you are interested in getting the job done simply and fast, read on...

I would suggest GNU Parallel if you have lots of things to be done in parallel and even more so as CPUs become "fatter" with more cores rather than "taller" with higher clock rates (GHz).

At its simplest, you can use ImageMagick just from the command line in Linux, macOS and Windows like this to resize a bunch of images:

magick mogrify -resize 128x128\! *.jpg

If you have hundreds of images, you would be better running that in parallel which would be:

parallel magick mogrify -resize 128x128\! ::: *.jpg

If you have millions of images, the expansion of *.jpg will overflow your shell's command buffer, so you can use the following to feed the image names in on stdin instead of passing them as parameters:

find -iname \*.jpg -print0 | parallel -0 -X --eta magick mogrify -resize 128x128\!

There are two "tricks" here:

  • I use find ... -print0 along with parallel -0 to null-terminate filenames so there are no problems with spaces in them,

  • I use parallel -X which means, rather than start a whole new mogrify process for each image, GNU Parallel works out how many filenames mogrify can accept, and gives it that many in batches.

I commend both tools to you.


Whilst the ImageMagick aspects of the above answer work on Windows, I don't use Windows and I am unsure about using GNU Parallel there. I think it maybe runs under git-bash and/or maybe under Cygwin - you could try asking a separate question - they are free!

As regards the ImageMagick part, I think you can get a listing of all the JPEG filenames in a file using this command:

DIR /S /B *.JPG > filenames.txt

You can then probably process them (not in parallel) like this:

magick mogrify -resize 128x128\! @filenames.txt

And if you find out how to run GNU Parallel on Windows, you can probably process them in parallel using something like this:

parallel --eta -a filenames.txt magick mogrify -resize 128x128\!
https://en.xdnf.cn/q/70651.html

Related Q&A

How to install Yandex CatBoost on Anaconda x64?

Iv successfully installed CatBoost via pip install catboostBut Iv got errors, when I tried sample python script in Jupiter Notebookimport numpy as np from catboost import CatBoostClassifierImportError:…

pyspark returns a no module named error for a custom module

I would like to import a .py file that contains some modules. I have saved the files init.py and util_func.py under this folder:/usr/local/lib/python3.4/site-packages/myutilThe util_func.py contains al…

Perform a conditional operation on a pandas column

I know that this should be simple, but I want to take a column from a pandas dataframe, and for only the entries which meet some condition (say less than 1), multiply by a scalar (say 2).For example, i…

How to programmatically get SVN revision number?

Like this question, but without the need to actually query the SVN server. This is a web-based project, so I figure Ill just use the repository as the public view (unless someone can advise me why this…

Convert fractional years to a real date in Python

How do I convert fractional years to a real date by using Python? E. g. I have an array [2012.343, 2012.444, 2012.509] containing fractional years and I would like to get "yyyy-mm-dd hh:mm".

Django template: Translate include with variable

I have a template in which you can pass a text variable. I want to include this template into another one but with a translated text as its variable. How can you achieve this?I would like something li…

Pandas - Creating a New Column

I have always made new columns in pandas using the following:df[new_column] = valueI am using this method, however, am receiving the warning for setting a copy.What is the way to make a new column with…

Adding an extra column to (big) SQLite database from Pandas dataframe

I feel like Im overlooking something really simple, but I cant make it work. Im using SQLite now, but a solution in SQLAlchemy would also be very helpful.Lets create our original dataset:### This is ju…

error inserting values to db with psycopg2 module [duplicate]

This question already has answers here:psycopg2: cant adapt type numpy.int64(4 answers)Inserting records into postgreSQL database in Python(3 answers)Closed 3 months ago.I am attempting to insert a dat…

NaN values in pivot_table index causes loss of data

Here is a simple DataFrame:> df = pd.DataFrame({a: [a1, a2, a3],b: [optional1, None, optional3],c: [c1, c2, c3],d: [1, 2, 3]}) > dfa b c d 0 a1 optional1 c1 1 1 a2 None c2…