Некоторые тайминги в файле с 100000 столбцов считаются самыми быстрыми, но отличаются на единицу:
In [14]: %%timeit
with open("test.csv" ) as f:
r = csv.reader(f, delimiter="\t")
len(next(r))
....:
10 loops, best of 3: 88.7 ms per loop
In [15]: %%timeit
with open("test.csv" ) as f:
next(f).count("\t")
....:
100 loops, best of 3: 11.9 ms per loop
with io.open('test.csv', 'r') as fin:
num_columns = len(next(fin).split('\t'))
....:
10 loops, best of 3: 133 ms per loop
Использование str.translate на самом деле кажется самым быстрым, хотя вам снова нужно добавить 1:
In [5]: %%timeit
with open("test.csv" ) as f:
n = next(f)
(len(n) - len(n.translate(None, "\t")))
...:
100 loops, best of 3: 9.9 ms per loop
Решение pandas дает мне ошибку:
in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7977)()
StopIteration:
Использование readline добавляет больше накладных расходов:
In [19]: %%timeit
with open("test.csv" ) as f:
f.readline().count("\t")
....:
10 loops, best of 3: 28.9 ms per loop
In [30]: %%timeit
with io.open('test.csv', 'r') as fin:
num_columns = len(fin.readline().split('\t'))
....:
10 loops, best of 3: 136 ms per loop
Различные результаты с использованием python 3.4:
In [7]: %%timeit
with io.open('test.csv', 'r') as fin:
num_columns = len(next(fin).split('\t'))
...:
10 loops, best of 3: 102 ms per loop
In [8]: %%timeit
with open("test.csv" ) as f:
f.readline().count("\t")
...:
100 loops, best of 3: 12.7 ms per loop
In [9]:
In [9]: %%timeit
with open("test.csv" ) as f:
next(f).count("\t")
...:
100 loops, best of 3: 11.5 ms per loop
In [10]: %%timeit
with io.open('test.csv', 'r') as fin:
num_columns = len(next(fin).split('\t'))
....:
10 loops, best of 3: 89.9 ms per loop
In [11]: %%timeit
with io.open('test.csv', 'r') as fin:
num_columns = len(fin.readline().split('\t'))
....:
10 loops, best of 3: 92.4 ms per loop
In [13]: %%timeit
with open("test.csv" ) as f:
r = csv.reader(f, delimiter="\t")
len(next(r))
....:
10 loops, best of 3: 176 ms per loop
person
Padraic Cunningham
schedule
28.04.2015
StopIteration:
- person Padraic Cunningham   schedule 28.04.2015awk '{print NF;quit}' file
- person Mark Setchell   schedule 28.04.2015