Abstract
Three different ways of processing text file line by line are given in the order of increasing efficiency.
I have to handle a large text file of space-separated data in python, and the data goes like this:
tag1 tag2 tag3
12 34 12
123 345 12
the first line is tags for each column, and the rest lines hold data. Since the tags are fixed, I can code it directly in to my script, that is to say the first line should be skipped. My first script goes like this:
file = open('foo.txt', 'r')
for line in file.readlines()[1:]:
#do something
This script requires a vast amount of RAM, since it has to store a list of all lines! So it is wise to use iterator:
file = open('foo.txt', 'r')
first = True
for line in file:
if first:
first = False
else:
#do something
The second script works much better than the first one, because the lines are read one by one from the file by using a iterator. However, the first flag is not a neat way to skip the first line for we have to test the flag many times, which makes no sense. And the problem is solved in the third script:
file = open('foo.txt', 'r')
file.readline()
for line in file:
#do something
the 'fileread.line()' command will perfectly move the file position one line forward, and the iter will then start from the second line:)
No comments:
Post a Comment