Files in Python

Most of the famous applications have been developed using Python, that main purpose is extensive file handling operation.

I had been assigned to create a Unix shell script which has to read the input files from different FTP location and parse that content of files with proper data structure format to pass it as input for DM express tool (It’s a business intelligence data integration tool).

With Unix Shell scripting, you can perform file read operation using control flow commands such as “For” & “While” , which is kind of I/O operation and suitable for small files  handling but the requirements is to read large files which will have Rx (tablets), doctor prescription information that’s been sold in each and every medical Shop of each location across the countries. You can imagine how big the file will be!

You can imagine how big the file will be!

So I was looking for a scripting language that uses professional file handling operation and found that the additional language of Perl & Python in the Unix server machines as pre-installed.

Perl has incurred powerful file operations that perform rapid I/O operations on file but my further requirements on parsing the file were not achievable with Perl.

So I chose Python and amazed the file I/O operations with contents parsing.
With file operations in Python, you can perform it on a single file, multiple files, file stream, zipped file & tar file directly.

The basic file operations in Python is,

  1. Open,
  2. Read
  3. Write
  4. Append
  5. Close

Let’s go directly with simple file read operation,

>>> fileobj = open("Python_train","r") -- File object.
>>> fileobj.read()  --Reading the file thorugh the file object.
'This is the first line\nsecond line\nthird line\ni think, this is enough and will write further whenever needed\n'
>>> fileobj.read()
''  --File object can be referenced only one time.

In Python, everything is an object, so whatever you want to reference in Python can be handled as an object. As shown in the above example, first I have created the file object (fileobj) which referenced the opening the file in reading mode.  In the next line, the file object has been referenced to access the contents of the files.

You should have noticed additional character “\n” which is to move the printing position to the start of the next line and this will be very helpful when you parse the file.

Readline()
>>> fileobj = open("Python_train","r")
>>>
>>> fileobj.readline()   --File obj to read the content of file one by one.
'This is the first line\n'
>>> fileobj.readline()
'second line\n'
>>> fileobj.readline()
'third line\n'
>>> fileobj.readline()
'i think, this is enough and will write further whenever needed\n'
>>> fileobj.readline()
''

With the same open command, we can write in the file be using fileobj as shown below,

>>> fileobj = open("Python_train","w")
>>> 
>>> fileobj.write('This is first line written with fileobj')
>>> fileobj.write('This is second line written with fileobj')
>>> fileobj.write('This is third line written with fileobj')
>>> fileobj.write('This is final line enough written with filobj')

Now there is another method to readlines()

>>> fileobj = open("Python_train","r")
>>> fileobj.readlines() -- whole file read as single line in list format
['This is first line written with fileobjThis is second line written with fileobjThis is third line written with fileobjThis is final line enough written with filobj']
>>>

So what is the difference noticed here when using read() & readlines() method over the fileobj?  the whole file has been read as one line whereas the read method read the whole file with new line (“/n”) separator.

So the file object method can be used based on your requirements while we parse the files. Further, if you are in need of reading the file on specific character position, we can pass the parameter as character position to the read() as below,

>>> fileobj = open("Python_train","r")
>>> fileobj.read(10)  -- Read only first 10 character from file
'This is fi'
>>> a = fileobj.read(20) -- Read only first 20 character from file 
>>> print a
rst line written wit --Since the first 10 character in the file has been read, the next read start with remaining position.
>>> len(a)  -- Lets validate the lenth of the character that I read from file.
20
>>>

Important things to remember when we use file object. The write method always think that it is called to write the new file, so you need to be very careful on when to use write method, suppose if you want to keep write on the existing file then you can open the file in append mode as shown below,

>>> fileobj = open("Python_train","r")
>>> fileobj.read()
'Its writing me as new line in the Python_train file and existing data has been cleaned'
>>> 
>>> 
>>> fileobj = open("Python_train","a") --Append parameter on open method
>>> fileobj.write("I want to keep previous contents and wants to add more in the file")
>>> 
>>> fileobj = open("Python_train","r")
>>> 
>>> fileobj.read() -- The appended line does not written in new line rather it is updated in the same row.
'Its writing me as new line in the Python_train file and existing data has been cleanedI want to keep previous contents and wants to add more in the file'
>>> fileobj = open("Python_train","a")
>>> fileobj.write("/n Wants to write in second line") --Write in new line
>>> 
>>> fileobj.writelines("Another way to write in the file") --Writelines specifically useful if you want to write in file from any list
>>> 
>>> fileobj = open("Python_train","r")
>>> fileobj.read()
'Its writing me as new line in the Python_train file and existing data has been cleanedI want to keep previous contents and wants to add more in the file/n Wants to write in second lineAnother way to write in the file'
>>>

Also, file object has some useful module which definitely useful when we want to perform the file operation,

>>> dir(fileobj)
['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']

close: This method has to be used to close the files that have been used in our program. Though if you have not called the close method to close the file, Python has the self-memory management which clears the objects that are idle.

>>> fileobj = open("Python_train","r")
>>> fileobj.read()
'Its writing me as new line in the Python_train file and existing data has been cleaned\nI want to keep previous contents and wants to add more in the file \nWants to write in second line\nAnother to write in the new line\n'
>>> 
>>> fileobj.close()  --Close the file
>>> 
>>> fileobj   --Explicitly calling the file object to know the status.
<closed file 'Python_train', mode 'r' at 0x7fa6d995c5d0>

Closed: This object used for validating whether the file is in closed status or not, this is one of best practice when you are using in our code. This object returns the boolean as “True” for file closed and “False” for file not closed.

>>> fileobj.closed
True

Also, there are other options is which can be used for various validation during file operation such as,

>>> fileobj = open("Python_train","r")
>>> 
>>> fileobj.mode  -- Validate the mode that the file has been opened for.
'r'       --mode object return "r" which tells the file is in read mode.
>>> 
>>> fileobj.name   --To validate the name of the file currently in process
'Python_train'
>>> fileobj.fileno() --returns the file descritor. Descriptor refer the mode of file opened for kernal to handle. 
3
>>> fileobj.isatty()  --To check whether the file is being updated, in another terms file is streaming. Can be used when want to read the running logs.
False   --The file that I have opened in not 

Here in this article, I have covered the basic Python file objects & modules as much as possible. Please provide your comments if you have any queries to discuss more.

I will cover more Python file operation in upcoming articles on how to read the multiple files, reading the logs, etc..

 

Leave a Reply