Python Paramiko directory walk over SFTP

2024/11/15 14:05:25

How to do os.walk() but on another computer through SSH? The problem is that os.walk() executes on a local machine and I want to ssh to another host, walk through a directory and generate MD5 hashes for every file within.

What I wrote so far looks like this (below code) but it doesn't work. Any help would be greatly appreciated.

try:hash_array = []ssh = paramiko.SSHClient()ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())ssh.connect('sunbeam', port=22, username='xxxx', password='filmlight')spinner.start()for root, dirs, files in os.walk(_path):for file in files:file_path = os.path.join(os.path.abspath(root), file)#  generate hash code for filehash_array.append(genMD5hash(file_path))file_nb += 1spinner.stop()spinner.ok('Finished.')return hash_array
except Exception as e:print(e)return None
finally:ssh.close() 
Answer

To recursively list a directory using Paramiko, with a standard file access interface, the SFTP, you need to implement a recursive function with a use of SFTPClient.listdir_attr:

from stat import S_ISDIR, S_ISREG
def listdir_r(sftp, remotedir):for entry in sftp.listdir_attr(remotedir):remotepath = remotedir + "/" + entry.filenamemode = entry.st_modeif S_ISDIR(mode):listdir_r(sftp, remotepath)elif S_ISREG(mode):print(remotepath)

Based on Python pysftp get_r from Linux works fine on Linux but not on Windows.


Alternatively, pysftp implements an os.walk equivalent: Connection.walktree.


Though you will have troubles getting MD5 of a remote file with SFTP protocol.

While Paramiko supports it with its SFTPFile.check, most SFTP servers (particularly the most widespread SFTP/SSH server – OpenSSH) do not. See:
How to check if Paramiko successfully uploaded a file to an SFTP server? and
How to perform checksums during a SFTP file transfer for data integrity?

So you will most probably have to resort to using shell md5sum command (if you even have a shell access). And once you have to use the shell anyway, consider listing the files with shell, as that will be magnitudes faster then via SFTP.

See md5 all files in a directory tree.

Use SSHClient.exec_command:
Comparing MD5 of downloaded files against files on an SFTP server in Python


Obligatory warning: Do not use AutoAddPolicy – You are losing a protection against MITM attacks by doing so. For a correct solution, see Paramiko "Unknown Server".

https://en.xdnf.cn/q/72111.html

Related Q&A

Python 2.7 32-bit install on Win 7: No registry keys?

I have downloaded the Python 2.7.2 Windows x86 32-bit MSI from python.org and installed it on a 64-bit Windows 7 system. Everything works (at least the command-line interpreter starts and runs), but t…

i18n with jinja2 + GAE

I googled for a GAE + jinja i18n example but could not find it. Can anyone provide a link or working example?My effort uses the django translations and I dont know if this is the recommend way of doin…

Interpolating one time series onto another in pandas

I have one set of values measured at regular times. Say:import pandas as pd import numpy as np rng = pd.date_range(2013-01-01, periods=12, freq=H) data = pd.Series(np.random.randn(len(rng)), index=rng)…

Reference class variable in a comprehension of another class variable

This may be a simple question, but Im having trouble making a unique search for it. I have a class that defines a static dictionary, then attempts to define a subset of that dictionary, also statically…

Pyspark module not found

Im trying to execute a simple Pyspark job in Yarn. This is the code:from pyspark import SparkConf, SparkContextconf = (SparkConf().setMaster("yarn-client").setAppName("HDFS Filter")…

Multiple windows in PyQt4?

Ive just begun using pyqt4. I followed a tutorial (http://zetcode.com/tutorials/pyqt4/) One thing that puzzles me is this part:def main():app = QtGui.QApplication(sys.argv)ex = GUI()sys.exit(app.exec()…

Fill missing timeseries data using pandas or numpy

I have a list of dictionaries which looks like this :L=[ { "timeline": "2014-10", "total_prescriptions": 17 }, { "timeline": "2014-11", "total_…

Can Biopython perform Seq.find() accounting for ambiguity codes

I want to be able to search a Seq object for a subsequnce Seq object accounting for ambiguity codes. For example, the following should be true:from Bio.Seq import Seq from Bio.Alphabet.IUPAC import IUP…

MySQL and lock a table, read, and then truncate

I am using mysqldb in python.I need to do the following for a table.1) Lock 2) Read 3) Truncate the table 4) UnlockWhen I run the below code, I get the below error. So, I am rather unsure on how to lo…

Train and predict on variable length sequences

Sensors (of the same type) scattered on my site are manually reporting on irregular intervals to my backend. Between reports the sensors aggregate events and report them as a batch. The following datas…