Fuck me on GayHub

Profiling-and-Optimizing

大学时看到下面的说法,深以为然,因此一直写代码很少优化。那个时候考虑更多的是如何跑起来和跑正确的问题,很少涉及跑的更快更好的问题。

过早的优化是万恶之源 –《计算机程序设计艺术》

如今因为工作关系,我不得不考虑如何跑的更快,跑的更好了。

如何跑的更快呢?

当程序运行正确后,需要考虑如何跑的更快,占用资源更少。其最终的目的是用最少的硬件资源为更多的用户提供服务。这里大致分为两种途径:
1, 业务调整优化
2, 代码调整优化
业务调整优化一般需要调整业务,通过舍弃一些不太重要的功能来大幅提升当前主功能的速度,需要跨team合作。比如当前页面可能不太需要的一些数据(获取比较耗时)可以舍弃。接口或者页面处理的数据少了,速度就会相应的提升。
代码调整优化则是开发者在不影响当前业务功能的前提下,内部调整优化代码逻辑。整个过程对终端用户是透明的。

本文我主要说明代码优化,这个是我们开发者在不依赖其它外部资源情况下就可以进行的。

Profiling

我要优化代码,那该从哪里优化呢?首先当然是定位,定位需要优化的地方。第一种方法是大致评估代码中哪一个模块或者函数比较耗时,然后重点分析。第二种方法是借助工具量化评估代码耗费资源情况(CPU,Mem),根据评估列表结果优化。第一种方法比较主观,因为不同的开发者往往评估的有差距,或者大型项目开发者对其它模块不太熟悉,无法正确评估。而第二种方法比较客观,通过监控统计代码耗费资源的情况量化其性能,开发者可以很容易从统计的结果中得出代码哪一块比较耗费资源。第二种方法就是Profiling。

Profiling 一般使用下面的工具:
1, Python装饰器,用来测量函数耗时

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import time
from functools import wraps

def timefn(fn):
@wraps(fn)
def measure_time(*args, **kwargs):
t1 = time.time()
result = fn(*args, **kwargs)
t2 = time.time()
print ("@timefn:" + fn.func_name + " took " + str(t2 - t1) + " senconds")
return result
return measure_time

@timefn
def func_to_measure():
pass

2, Python Module timeit

1
2
3
python -m timeit -n 5 -r 5 ...

5 loops, best of 5: 13.1 sec per loop

3, Unix Time Command

1
2
3
4
5
6
/usr/bin/time -p python scripts.py
...

real 13.46
user 13.40
sys 0.04

4, cProfile module
可以查看当前程序中用户定义函数或者Python内部函数的调用及耗时情况。一般用来定位函数层级。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
python -m cProfile -s cumulative problem1.py 

66716 function calls in 9.435 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
3 0.000 0.000 0.000 0.000 functools.py:17(update_wrapper)
3 0.000 0.000 0.000 0.000 functools.py:39(wraps)
1 0.000 0.000 9.435 9.435 problem1.py:1(<module>)
1 9.407 9.407 9.408 9.408 problem1.py:20(merge_two_list)
1 0.019 0.019 0.025 0.025 problem1.py:4(get_muliply_res)
1 0.001 0.001 0.001 0.001 timing_decorators.py:1(<module>)
3 0.000 0.000 0.000 0.000 timing_decorators.py:5(timefn)
2 0.000 0.000 9.433 4.717 timing_decorators.py:6(measure_time)
15 0.000 0.000 0.000 0.000 {getattr}
66665 0.005 0.000 0.005 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects}
3 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
2 0.003 0.002 0.003 0.002 {range}
9 0.000 0.000 0.000 0.000 {setattr}
1 0.000 0.000 0.000 0.000 {sum}
4 0.000 0.000 0.000 0.000 {time.time}

5, line_profiler module
上述cProfile定位到函数级别后,可以使用line_profiler定位到函数中具体某一行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
kernprof -l -v problem1.py 


Total time: 9.80845 s
File: problem1.py
Function: merge_two_list at line 20

Line # Hits Time Per Hit % Time Line Contents
==============================================================
20 @timefn
21 @profile
22 def merge_two_list(small_lst, big_lst):
23 """ merge two list into one, remove duplicate elements """
24 1 158.0 158.0 0.0 res = big_lst
25 20000 7026.0 0.4 0.1 for ele in small_lst:
26 19999 9792981.0 489.7 99.8 if ele not in res:
27 13333 8285.0 0.6 0.1 res.append(ele)
28
29 1 0.0 0.0 0.0 return res

6, memory_profiler
前面的工具都是测量程序运行耗时的,这个工具用来测量内存占用情况。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
python -m memory_profiler problem1.py

Filename: problem1.py

Line # Mem usage Increment Line Contents
================================================
20 30.691 MiB 30.691 MiB @timefn
21 @profile
22 def merge_two_list(small_lst, big_lst):
23 """ merge two list into one, remove duplicate elements """
24 30.695 MiB 0.004 MiB res = big_lst
25 30.977 MiB 0.000 MiB for ele in small_lst:
26 30.977 MiB 0.168 MiB if ele not in res:
27 30.977 MiB 0.004 MiB res.append(ele)
28
29 30.977 MiB 0.000 MiB return res

还有其它工具这里将不赘述。

实验

从1w个电影title中找出重复的title并打印出来,数据如下:

1
2
3
4
5
6
7
8
9
10
11
Carmencita
Le clown et ses chiens
Pauvre Pierrot
Un bon bock
Blacksmith Scene
Chinese Opium Den
Corbett and Courtney Before the Kinetograph
Edison Kinetoscopic Record of a Sneeze
Miss Jerry
Exiting the Factory
...

titles_1w.txt

代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def read_movies(src):
with open(src) as fd:
return fd.read().splitlines()


def is_duplicate(needle, haystack):
for movie in haystack:
if needle.lower() == movie.lower():
return True
return False


def find_duplicate_movies(src="titles_1w.txt"):
movies = read_movies(src)
duplicates = []

while movies:
movie = movies.pop()
if movie in movies:
duplicates.append(movie)

return duplicates


if __name__ == '__main__':
res = find_duplicate_movies()
print res

运行上面的脚本发现耗时35秒

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
python -m cProfile find_duplicate_movies.py

97560705 function calls in 35.499 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 35.499 35.499 find_duplicate_movies.py:1(<module>)
1 0.000 0.000 0.001 0.001 find_duplicate_movies.py:1(read_movies)
1 0.013 0.013 35.499 35.499 find_duplicate_movies.py:13(find_duplicate_movies)
10001 12.619 0.001 35.482 0.004 find_duplicate_movies.py:6(is_duplicate)
376 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
97540320 22.863 0.000 22.863 0.000 {method 'lower' of 'str' objects}
10001 0.004 0.000 0.004 0.000 {method 'pop' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'read' of 'file' objects}
1 0.001 0.001 0.001 0.001 {method 'splitlines' of 'str' objects}
1 0.000 0.000 0.000 0.000 {open}

分析cProfile的结果,可以看出耗时发生在两个函数调用:

1
2
3
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10001 12.619 0.001 35.482 0.004 find_duplicate_movies.py:6(is_duplicate)
97540320 22.863 0.000 22.863 0.000 {method 'lower' of 'str' objects}

第一次优化

对应代码中的函数:is_duplicate 和 build-in: lower,可以看出string的lower函数比较耗费时间,花费了22.863秒。我们可以在读入movie title后直接转化成lower case,进而防止在is_duplicate中2次调用lower。改进后的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def read_movies(src):
with open(src) as fd:
return fd.read().splitlines()


def is_duplicate(needle, haystack):
for movie in haystack:
if needle == movie:
return True
return False


def find_duplicate_movies(src="titles_1w.txt"):
movies = read_movies(src)
movies = [movie.lower() for movie in movies]
duplicates = []

while movies:
movie = movies.pop()
if is_duplicate(movie, movies):
duplicates.append(movie)

return duplicates


if __name__ == '__main__':
res = find_duplicate_movies()
print res[0:5]

运行改进后的程序,结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
python -m cProfile find_duplicate_movies.py

30386 function calls in 1.175 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.175 1.175 find_duplicate_movies.py:1(<module>)
1 0.000 0.000 0.001 0.001 find_duplicate_movies.py:1(read_movies)
1 0.006 0.006 1.175 1.175 find_duplicate_movies.py:13(find_duplicate_movies)
10001 1.163 0.000 1.163 0.000 find_duplicate_movies.py:6(is_duplicate)
376 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
10001 0.003 0.000 0.003 0.000 {method 'lower' of 'str' objects}
10001 0.001 0.000 0.001 0.000 {method 'pop' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'read' of 'file' objects}
1 0.001 0.001 0.001 0.001 {method 'splitlines' of 'str' objects}
1 0.000 0.000 0.000 0.000 {open}

可以看出,总耗时下降到1.175秒。

第二次优化

从上面profiling结果可以看出主要耗时函数为:is_duplicate。这里去掉is_duplicate 函数调用,直接在find_duplicate_movies中判断当前movie是否在剩余movies中,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def read_movies(src):
with open(src) as fd:
return fd.read().splitlines()


def find_duplicate_movies(src="titles_1w.txt"):
movies = read_movies(src)
movies = [movie.lower() for movie in movies]
duplicates = []

while movies:
movie = movies.pop()
if movie in movies:
duplicates.append(movie)

return duplicates


if __name__ == '__main__':
res = find_duplicate_movies()
print res[0:5]

运行改进后的程序,结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
python -m cProfile find_duplicate_movies.py

20385 function calls in 0.463 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.463 0.463 find_duplicate_movies.py:1(<module>)
1 0.000 0.000 0.001 0.001 find_duplicate_movies.py:1(read_movies)
1 0.457 0.457 0.463 0.463 find_duplicate_movies.py:13(find_duplicate_movies)
376 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
10001 0.003 0.000 0.003 0.000 {method 'lower' of 'str' objects}
10001 0.001 0.000 0.001 0.000 {method 'pop' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'read' of 'file' objects}
1 0.001 0.001 0.001 0.001 {method 'splitlines' of 'str' objects}
1 0.000 0.000 0.000 0.000 {open}

总耗时下降到0.463秒。

第三次优化

上述判断是否重复是通过pop出一个title后,然后检查title是否在剩余的list中。这里我们可以使用更高效的方法,先将list排序,那么相同的title将会在list中紧挨着,这个时候只要循环排序后的数组,比较idx和idx+1的元素即可。代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def read_movies(src):
with open(src) as fd:
return fd.read().splitlines()


def find_duplicate_movies(src="titles_1w.txt"):
movies = read_movies(src)
movies = [movie.lower() for movie in movies]
movies.sort()
duplicates = []
for idx in range(len(movies) - 1):
if movies[idx] == movies[idx + 1]:
duplicates.append(movies[idx])

return duplicates


if __name__ == '__main__':
res = find_duplicate_movies()
print res[0:5]

运行改进后的程序,结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
python -m cProfile find_duplicate_movies.py

10387 function calls in 0.010 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.010 0.010 find_duplicate_movies.py:1(<module>)
1 0.000 0.000 0.001 0.001 find_duplicate_movies.py:1(read_movies)
1 0.002 0.002 0.010 0.010 find_duplicate_movies.py:13(find_duplicate_movies)
1 0.000 0.000 0.000 0.000 {len}
376 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
10001 0.003 0.000 0.003 0.000 {method 'lower' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'read' of 'file' objects}
1 0.003 0.003 0.003 0.003 {method 'sort' of 'list' objects}
1 0.001 0.001 0.001 0.001 {method 'splitlines' of 'str' objects}
1 0.000 0.000 0.000 0.000 {open}
1 0.000 0.000 0.000 0.000 {range}

总耗时下降到0.010秒。