Directory operations using Python

As we have seen in the past articles, about how python extensively used on files and parsing those files using varieties of objects & modules to perform an action. Python has modules to perform certain operations on directories also.

Though each operating system has its own directory structure, Python has the special module called os.path which is almost suitable for all the operating system’s path naming conventions.

>>> from os import path
>>>
>>> dir(path)
['__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_joinrealpath', '_unicode', '_uvarprog', '_varprog', 'abspath', 'altsep', 'basename', 'commonprefix', 'curdir', 'defpath', 'devnull', 'dirname', 'exists', 'expanduser', 'expandvars', 'extsep', 'genericpath', 'getatime', 'getctime', 'getmtime', 'getsize', 'isabs', 'isdir', 'isfile', 'islink', 'ismount', 'join', 'lexists', 'normcase', 'normpath', 'os', 'pardir', 'pathsep', 'realpath', 'relpath', 'samefile', 'sameopenfile', 'samestat', 'sep', 'split', 'splitdrive', 'splitext', 'stat', 'supports_unicode_filenames', 'sys', 'walk', 'warnings']

To perform the directory operation, first, we have to import os module which has the submodules to play around in the directory.

while we see the example of directory operation, I will try to differentiate between Linux & Windows environment.

The first command we type or do in any operating system to know my present working directory.

os.getwcd()  — to know the current working directory.

Unix:
>>> import os
>>> os.getcwd()
'/home/vinoth'

Windows:
>>> import os
>>> os.getcwd()
'C:\\Python27amd64'

Further, we would like to know the file & folders that present in my current working directory.

os.listdir(path) — To list the files/directories under my path.

Unix:
>>> os.listdir('/home/vinoth')
['.bash_history', '.bash_logout', '.bashrc', '.cache', '.git', '.gitconfig', '.ipynb_checkpoints', '.ipython', '.jupyter', '.lesshst', '.local', '.pip', '.profile', '.ssh', '.viminfo', '.w3m', 'Envs', 'README.md', 'Untitled.ipynb', 'Untitled1.ipynb', 'sample.py', 'sample.pyc', 'venv']

Windows:
>>> os.listdir('C:\\Python27amd64')
['design', 'design.egg-info', 'DLLs', 'Doc', 'include', 'Lib', 'libs', 'LICENSE.txt', 'NEWS.txt', 'PKG-INFO', 'python.exe', 'pythoncom27.dll', 'pythoncomloader27.dll', 'pythonw.exe', 'pywintypes27.dll', 'qt.conf', 'README.rst', 'README.txt', 'Scripts', 'setup.cfg', 'setup.py', 'tcl', 'Tools']

Next, to know whether I have access to any of the files/folders inside in my directory.

os.access(path, mode) — To get the access on the particular path with the mode to know what operation I can perform.

Unix:
>>> os.access('sample.py',os.R_OK)
True

Windows:
>>> os.access('README.rst',os.R_OK)
True

The R_OK is the mode which is the object of os module. Below are the lists of mode that we can use as part of os.access.

os.F_OK
Value to pass as the mode parameter of access() to test the existence of a path.
os.R_OK
Value to include in the mode parameter of access() to test the readability of path.
os.W_OK
Value to include in the mode parameter of access() to test the writability of path.
os.X_OK
Value to include in the mode parameter of access() to determine if a path can be executed.

Now I wanted to enter into some of the directories that are inside my current working directory.

Unix:
>>> os.chdir('Envs')
>>> os.getcwd()
'/home/vinoth/Envs'

Windows:
>>> os.chdir('Doc')
>>> os.getcwd()
'C:\\Python27amd64\\Doc'

Also, we can perform below operation on the directory such as,

  • os.chroot – Change the root directory of the current process path.
  • os.chmod(path, mode) – From your python application, you can change the mode of the directory.
  • os.chown(path, uid, gid) – To change the owner of the path.
  • os.link(source, link_name) – To create a link on your directory.

os.walk(directory, followlink=False/True):

Directory operations not limited in python, as we are facilitated to do in out python programming to traverse on any directory either it is from the root or leaves the level.

>>> import os
>>> os.listdir('.')
['.bash_history', '.bash_logout', '.bashrc', '.cache', '.git', '.gitconfig', '.ipynb_checkpoints', '.ipython', '.jupyter', '.lesshst', '.local', '.pip', '.profile', '.ssh', '.viminfo', '.w3m', 'Envs', 'README.md', 'Untitled.ipynb', 'Untitled1.ipynb', 'sample.py', 'sample.pyc', 'venv']
>>>
>>> os.walk('.')
<generator object walk at 0x7f43309e1a50>  --The os.walk produce the object which is iteratable.
>>> os.walk('.').next()  --It generate three tuples to iterate.
('.', ['.cache', '.git', '.ipynb_checkpoints', '.ipython', '.jupyter', '.local', '.pip', '.ssh', '.w3m', 'Envs', 'venv'], ['.bash_history', '.bash_logout', '.bashrc', '.gitconfig', '.lesshst', '.profile', '.viminfo', 'README.md', 'Untitled.ipynb', 'Untitled1.ipynb', 'sample.py', 'sample.pyc'])

Some more effective way to use os.walk in our code to play around the directories & files.

>>> for directory,path,name in os.walk('.'):
... print directory,path,name,'/n'
...
...
. ['.cache', '.git', '.ipynb_checkpoints', '.ipython', '.jupyter', '.local', '.pip', '.ssh', '.w3m', 'Envs', 'venv'] ['.bash_history', '.bash_logout', '.bashrc', '.gitconfig', '.lesshst', '.profile', '.python_history', '.viminfo', 'README.md', 'Untitled.ipynb', 'Untitled1.ipynb', 'sample.py', 'sample.pyc'] /n
./.cache ['pip'] [] /n
./.cache/pip ['http', 'wheels'] [] /n
./.cache/pip/http ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'] [] /n

As we can see in the above example, os.walk perform recursive travel across the directories and subdirectory. It produces three tuples such as directory,sub-directory & Filename. By using for loop on the os.walk object, we can iterate on each folder and file to do any kind of directory or file parsing.

Note: with option followlinks=True, os.walk() will consider traverse on link directory.

There is additional os.scandir(‘.’) which do high-performance directory operation and yield directory like an object for further action. But this module available starting python 3.5 version.

os.path:

Another extensive module which helps to play around the path names.

vinoth@LAPTOP-U4G2071G:~$ pwd
/home/vinoth
vinoth@LAPTOP-U4G2071G:~$
vinoth@LAPTOP-U4G2071G:~$
vinoth@LAPTOP-U4G2071G:~$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from os import path
>>> path.abspath('/home/vinoth')
'/home/vinoth'
>>> path.abspath('/home/vinoth/sample.py') --this return the absolute path of the given path as parameter.
'/home/vinoth/sample.py'
>>> path.basename('/home/vinoth/sample.py') --provide the last name in the path
'sample.py'
>>> path.commonprefix('/home/vinoth/sample.py') -- This will take list (directory list) as parameter to provide the common path.
''
>>> path.commonprefix(['/home/vinoth/sample.py','/home/vinoth/sample.pyc','/home/vinoth/Untitled.ipynb'])
'/home/vinoth/'
>>> path.dirname('/home/vinoth/sample.py') --provide only the directory name
'/home/vinoth'

With the example above, you might have confused with path.abspath, which produces an absolute path of the given path. You will understand the purpose of abspath() when you perform the coding with directory operation on the pathname.

With os.path module we can perform more path related check as most of the module under os.path return booleans output,

>>> os.path.exists('/home/vinoth') -- To check whether the given folder is present in my path
True
>>>
>>> os.path.exists('/home/vinothd')
False

In Unix command, we use find command to get the recent change, modified, access time but in python that is also very useful to perform, so you have powerful module when do python scripting to perform any kind of automation for any repetitive task.

>>> os.path.getatime('/home/vinoth')
1506192316.3023088
>>> os.path.getmtime('/home/vinoth')
1507611152.3911762
>>> os.path.getctime('/home/vinoth')
1507611152.3911762

The above modules are returned the times in seconds, but not to worry as we have another module time which can be used to convert the time to human readable form as below.

>>> import time --Have to import the module time to access its submodule.
>>> time.ctime(os.path.getctime('/home/vinoth'))
'Tue Oct 10 08:52:32 2017'
>>> time.ctime(os.path.getmtime('/home/vinoth'))
'Tue Oct 10 08:52:32 2017'
>>> time.ctime(os.path.getatime('/home/vinoth'))
'Sat Sep 23 22:45:16 2017'

 Time module features are not limited with ctime (convert time) as we have more option to explore which we can discuss in upcoming articles as it is out of scope in this article.

There are some more module to perform with if conditions, for example, I have a requirement in my program to access only the path that starts with ‘/’ in Unix & ‘\’ in windows.

>>> os.path.isabs('/home/vinoth') --Returns True since my path start with "/"
True
>>> os.path.isabs('home/vinoth') --Returns False since my path start with "/"
False
>>> if os.path.isabs('/home/vinoth'):
... print 'I have full path to do list directory'
... listdir('home/vinoth')
... else:
... print 'Its not absolute path, so do join path'
... os.path.join('.','home/vinoth')
...
I have full path to do list directory
Traceback (most recent call last):
 File "<stdin>", line 3, in <module>
NameError: name 'listdir' is not defined
>>> if os.path.isabs('/home/vinoth'):
... print 'I have full path to do list directory'
... os.listdir('/home/vinoth')
... else:
... print 'Its not absolute path, so do join path'
... os.path.join('.','home/vinoth')
...
I have full path to do list directory
['.bash_history', '.bash_logout', '.bashrc', '.cache', '.git', '.gitconfig', '.ipynb_checkpoints', '.ipython', '.jupyter', '.lesshst', '.local', '.pip', '.profile', '.python_history', '.ssh', '.viminfo', '.w3m', 'Envs', 'README.md', 'Untitled.ipynb', 'Untitled1.ipynb', 'sample.py', 'sample.pyc', 'venv']
>>> if os.path.isabs('home/vinoth'):
... print 'I have full path to do list directory'
... os.listdir('/home/vinoth')
... else:
... print 'Its not absolute path, so do join path'
... os.path.join('.','home/vinoth')  --os.path.join is helps to connect path & file respectively.
...
Its not absolute path, so do join path
'./home/vinoth'

Still, we have some more method to do boolean like operations such as,

>>> os.path.isfile('README.md') --Return True if given parameter is file.
True
>>> os.path.isfile('.ipython')
False
>>> os.path.isdir('.ipython')  --Return True if given parameter is directory.
True
>>> os.path.isdir('README.md')
False
>>> os.path.ismount('.')   --Return True if given parameter is from mount point.
False
>>> os.path.ismount('/home')
True
>>> os.path.islink('/home/vinoth/Envs/')  --Return True if given parameter is unix link path.
False
>>> os.path.samefile('/home/vinoth/sample.py','/home/vinoth/Test/sample.py') --Though the filename is same, since the path is different it return False
False
>>> os.path.samefile('/home/vinoth/sample.py','/home/vinoth/sample.py') --Return True if i am refereing same path in different input parameter in my code
True

There are still more when it comes to talking about python operation on files & directories. These modules are really useful for admin guys who do automation on their day to day works or do some automated deployment, os patch upgrade, logs back-up, provisioning folder structure, etc.

Leave a Reply