getting line-numbers that were changed

2024/11/16 16:48:05

Given two text files A,B, what is an easy way to get the line numbers of lines in B not present in A? I see there's difflib, but don't see an interface for retrieving line numbers

Answer

difflib can give you what you need. Assume:

a.txt

this 
is 
a 
bunch 
of 
lines

b.txt

this 
is 
a 
different
bunch 
of 
other
lines

code like this:

import difflibfileA = open("a.txt", "rt").readlines()
fileB = open("b.txt", "rt").readlines()d = difflib.Differ()
diffs = d.compare(fileA, fileB)
lineNum = 0for line in diffs:# split off the codecode = line[:2]# if the  line is in both files or just b, increment the line number.if code in ("  ", "+ "):lineNum += 1# if this line is only in b, print the line number and the text on the lineif code == "+ ":print "%d: %s" % (lineNum, line[2:].strip())

gives output like:

bgporter@varese ~/temp:python diffy.py 
4: different
7: other

You'll also want to look at the difflib code "? " and see how you want to handle that one.

(also, in real code you'd want to use context managers to make sure the files get closed, etc etc etc)

https://en.xdnf.cn/q/71647.html

Related Q&A

How to subclass a subclass of numpy.ndarray

Im struggling to subclass my own subclass of numpy.ndarray. I dont really understand what the problem is and would like someone to explain what goes wrong in the following cases and how to do what Im t…

How to ignore an invalid SSL certificate with requests_html?

So basically Im trying to scrap the javascript generated data from a website. To do this, Im using the Python library requests_html. Here is my code :from requests_html import HTMLSession session = HTM…

Fabric asks for root password

I am using Fabric to run the following:def staging():""" use staging environment on remote host"""env.user = ubuntuenv.environment = stagingenv.hosts = [host.dev]_setup_pa…

Beautifulsoup results to pandas dataframe

The below code returns me a table with the following resultsr = requests.get(url) soup = bs4.BeautifulSoup(r.text, lxml)mylist = soup.find(attrs={class: table_grey_border}) print(mylist)results - it st…

XGBoost CV and best iteration

I am using XGBoost cv to find the optimal number of rounds for my model. I would be very grateful if someone could confirm (or refute), the optimal number of rounds is: estop = 40res = xgb.cv(params, d…

Whats the correct way to implement a metaclass with a different signature than `type`?

Say I want to implement a metaclass that should serve as a class factory. But unlike the type constructor, which takes 3 arguments, my metaclass should be callable without any arguments:Cls1 = MyMeta()…

Python -- Regex -- How to find a string between two sets of strings

Consider the following:<div id=hotlinklist><a href="foo1.com">Foo1</a><div id=hotlink><a href="/">Home</a></div><div id=hotlink><a…

Kivy TextInput horizontal and vertical align (centering text)

How to center a text horizontally in a TextInput in Kivy?I have the following screen:But I want to centralize my text like this:And this is part of my kv language:BoxLayout: orientation: verticalLabe…

How to capture python SSL(HTTPS) connection through fiddler2

Im trying to capture python SSL(HTTPS) connections through Fiddler2 local proxy. But I only got an error.codeimport requests requests.get("https://www.python.org", proxies={"http": …

removing leading 0 from matplotlib tick label formatting

How can I change the ticklabels of numeric decimal data (say between 0 and 1) to be "0", ".1", ".2" rather than "0.0", "0.1", "0.2" in matplo…