Consider the following code in golang
now := time.Now()
sec1 := now.Unix()file, err := os.Open(file_name)
if err != nil {log.Fatal(err)
}
defer file.Close()scanner := bufio.NewScanner(file)var parsedLine []stringfor scanner.Scan() {parsedLine = strings.Fields(scanner.Text())
}fmt.Println(parsedLine)
now2 := time.Now()
sec2 := now2.Unix()
fmt.Println(sec2 - sec1) // takes 24 second for file1.txt
And consider this python program
start = time.time()with open(file) as f:for line in f:parsedLine = line.split()end = time.time()
print end - start # takes 4.6450419426 second for file1.txt
I observe the golang program is 5 times slower than the python program on a mac book pro
Specifically this line
parsedLine = strings.Fields(scanner.Text())
is very slow.
If I change that line in golang to
if strings.Contains(scanner.Text(), "string_that_never_exist") {continue
}
// take less than 1 second
and python to
if "string_that_never_exist" in line:continue
# takes 2.86928987503 second
Golang version is now much faster than python one.
I am slightly perplexed on why strings.Fields(scanner.Text())
may be slower than line.split()
I feel I am missing something silly, can someone point me out why the golang version take longer than python
Any benchmark should be a good scientific experiment. It must be reproducible.
First, define the readily available input:
The Complete Works of William Shakespeare by William Shakespeare:
http://www.gutenberg.org/files/100/100-0.txt
Next, fully define the executable programs:
linesplit.py
:
import time;
start = time.time()# http://www.gutenberg.org/files/100/100-0.txt
file = "/home/peter/shakespeare.100-0.txt"
with open(file) as f:for line in f:parsedLine = line.split()end = time.time()
print (end - start)
linesplit.go
:
package mainimport ("bufio""fmt""log""os""strings""time"
)func main() {now := time.Now()sec1 := now.Unix()// http://www.gutenberg.org/files/100/100-0.txtfile_name := "/home/peter/shakespeare.100-0.txt"file, err := os.Open(file_name)if err != nil {log.Fatal(err)}defer file.Close()scanner := bufio.NewScanner(file)var parsedLine []stringfor scanner.Scan() {parsedLine = strings.Fields(scanner.Text())}fmt.Println(parsedLine)now2 := time.Now()sec2 := now2.Unix()fmt.Println(sec2 - sec1) // takes 24 second for file1.txtfmt.Println(time.Since(now))
}
Then, provide the benchmark results:
$ python2 --version
Python 2.7.14
$ time python2 linesplit.py
.07024809169769
real 0m0.089s
user 0m0.089s
sys 0m0.000s$ python3 --version
Python 3.6.3
$ time python3 linesplit.py
0.12172794342041016
real 0m0.159s
user 0m0.155s
sys 0m0.004s$ go version
go version devel +39ad208c13 Tue Jun 12 19:10:34 2018 +0000 linux/amd64
$ go build linesplit.go && time ./linesplit
[]
1
91.833622ms
real 0m0.100s
user 0m0.094s
sys 0m0.004s$
We have Python2 < Go < Python3 or 0.0724 < 0.0918 < 0.1217 or, as a ratio, 1.00 < 1.31 < 1.73. Python2 is ASCII. Go and Python3 are Unicode.