Why myVar = strings.Fields(scanner.Text()) take much more time than comparable operation in python?

2024/10/5 21:18:32

Consider the following code in golang

now := time.Now()
sec1 := now.Unix()file, err := os.Open(file_name)
if err != nil {log.Fatal(err)
}
defer file.Close()scanner := bufio.NewScanner(file)var parsedLine []stringfor scanner.Scan() {parsedLine = strings.Fields(scanner.Text())
}fmt.Println(parsedLine)
now2 := time.Now()
sec2 := now2.Unix()
fmt.Println(sec2 - sec1) // takes 24 second for file1.txt

And consider this python program

start = time.time()with open(file) as f:for line in f:parsedLine = line.split()end = time.time() 
print end - start # takes 4.6450419426 second for file1.txt

I observe the golang program is 5 times slower than the python program on a mac book pro

Specifically this line

parsedLine = strings.Fields(scanner.Text())

is very slow.

If I change that line in golang to

if strings.Contains(scanner.Text(), "string_that_never_exist") {continue
}
// take less than 1 second

and python to

if "string_that_never_exist" in line:continue
# takes 2.86928987503 second

Golang version is now much faster than python one.

I am slightly perplexed on why strings.Fields(scanner.Text()) may be slower than line.split()

I feel I am missing something silly, can someone point me out why the golang version take longer than python

Answer

Any benchmark should be a good scientific experiment. It must be reproducible.

First, define the readily available input:

The Complete Works of William Shakespeare by William Shakespeare:

http://www.gutenberg.org/files/100/100-0.txt

Next, fully define the executable programs:

linesplit.py:

import time; 
start = time.time()# http://www.gutenberg.org/files/100/100-0.txt
file = "/home/peter/shakespeare.100-0.txt"
with open(file) as f:for line in f:parsedLine = line.split()end = time.time() 
print (end - start)

linesplit.go:

package mainimport ("bufio""fmt""log""os""strings""time"
)func main() {now := time.Now()sec1 := now.Unix()// http://www.gutenberg.org/files/100/100-0.txtfile_name := "/home/peter/shakespeare.100-0.txt"file, err := os.Open(file_name)if err != nil {log.Fatal(err)}defer file.Close()scanner := bufio.NewScanner(file)var parsedLine []stringfor scanner.Scan() {parsedLine = strings.Fields(scanner.Text())}fmt.Println(parsedLine)now2 := time.Now()sec2 := now2.Unix()fmt.Println(sec2 - sec1) // takes 24 second for file1.txtfmt.Println(time.Since(now))
}

Then, provide the benchmark results:

$ python2 --version
Python 2.7.14
$ time python2 linesplit.py
.07024809169769
real    0m0.089s
user    0m0.089s
sys     0m0.000s$ python3 --version
Python 3.6.3
$ time python3 linesplit.py
0.12172794342041016
real    0m0.159s
user    0m0.155s
sys     0m0.004s$ go version
go version devel +39ad208c13 Tue Jun 12 19:10:34 2018 +0000 linux/amd64
$ go build linesplit.go && time ./linesplit
[]
1
91.833622ms
real    0m0.100s
user    0m0.094s
sys     0m0.004s$ 

We have Python2 < Go < Python3 or 0.0724 < 0.0918 < 0.1217 or, as a ratio, 1.00 < 1.31 < 1.73. Python2 is ASCII. Go and Python3 are Unicode.

https://en.xdnf.cn/q/119024.html

Related Q&A

When reading an excel file in Python can we know which column/field is filtered

I want to capture the field or column name that is filtered in the excel file when reading through python. I saw that we can also capture only the filtered rows by using openpyxl and using hidden == Fa…

Error:__init__() missing 1 required positional argument: rec

I am new to python. I am trying to do microphone file that ought to detect, listen, record and write the .wav files. However, it is giving me an error while I am trying to run the file. It is saying:Ty…

Maya: Connect two Joint chains with Parent Constraint

So here is a snipit of an IK spine builder Ive been working on. Ive figure out how to make lists to duplicate the bound into an IK chain, what Ive got stuck on however is I want my list and for loop to…

What is the equivalent for onkeydown and onkeyup (Javascript events) in python?

There are events called onkeydown and onkeyup in Javascript. Can anyone please suggest the python equivalent of it?

Matching number string pairs

I have the following sample string:R10666: 273141 C1 + 273141 C2 + 273141 C3 + 273141 C4 + 273141 C5 - 273141 C6I want to obtain:[(273141,C1), ..., (- 273141, C6)]The numbers can be floating point numb…

Turning a text file into a tabular format [duplicate]

This question already has answers here:How do I print parameters of multiple objects in table form? [duplicate](2 answers)Line up columns of numbers (print output in table format)(7 answers)Closed 5 y…

Python: Read file with list as list

I have placed a list in a text file. I want python to read the text file and return the contents as a list. However, it is instead reading the content as a string:Text file:[a,b,c]Python:ids=[]writtenF…

Tkinter scrollbar not scrollable

I followed some tutorial on attaching a scrollbar to a textbox. However, in the tutorial, the scrollbar is really a "bar". When I tried myself, I can only press the arrows to move up or down,…

How to create multiple roles through discord.py bot?

I have been trying to make my discord bot create multiple roles through a command. But it simply doesnt work. Here is what I have done so far: @commands.command()async def create_roles(self, ctx):guild…

python: how do i know when i am on the last for cycle

for i in range(len(results_histogram)):if i!=len(results_histogram)-1:url+=str(results_histogram[i])+,my if statement is checking whether i am on the last loop, but it is not working. what am i doing w…