Yuanjie's Tech Blog: How to process large text file efficiently in Python

Wednesday, May 6, 2009

How to process large text file efficiently in Python

Abstract
Three different ways of processing text file line by line are given in the order of increasing efficiency.

I have to handle a large text file of space-separated data in python, and the data goes like this:

tag1 tag2 tag3
12 34 12
123 345 12

the first line is tags for each column, and the rest lines hold data. Since the tags are fixed, I can code it directly in to my script, that is to say the first line should be skipped. My first script goes like this:

file = open('foo.txt', 'r')
for line in file.readlines()[1:]:
  #do something

This script requires a vast amount of RAM, since it has to store a list of all lines! So it is wise to use iterator:

file = open('foo.txt', 'r')
first = True
for line in file:
  if first:
    first = False
  else:
    #do something

The second script works much better than the first one, because the lines are read one by one from the file by using a iterator. However, the first flag is not a neat way to skip the first line for we have to test the flag many times, which makes no sense. And the problem is solved in the third script:

file = open('foo.txt', 'r')
file.readline()
for line in file:
  #do something

the 'fileread.line()' command will perfectly move the file position one line forward, and the iter will then start from the second line:)

Yuanjie's Tech Blog

Wednesday, May 6, 2009

How to process large text file efficiently in Python

No comments:

Post a Comment

About Me

About this Blog

Labels

Blog Archive

Links Related to My Research

Links of My Favorate Software

Books of Favor

Languages I've Used

Followers

Yuanjie's Tech Blog

Wednesday, May 6, 2009

How to process large text file efficiently in Python

No comments:

Post a Comment

About Me

About this Blog

Subscribe Me

Labels

Blog Archive

Links Related to My Research

Links of My Favorate Software

Books of Favor

Languages I've Used

Subscribe Me

Followers