Script to extract fields from CSV

Introduction

  • I was looking for more projects to do for python scripting and that's when i saw video by 'Internet made Coder' on YouTube that explained 5 good scripting projects with explanations. So, today I'm making one out of 5 from there.

Libraries

  • For this we will need to import a few libraries

    • csv

    • sys

    • argv from sys

import csv
import sys
from sys import argv

Code

  • This is basically this program will extract the data in particular fields from a csv.

  • I created a function to do this, it takes the file name/path and the fields to be extracted as the parameters.

if __name__=="__main__":
    FILE_NAME=argv[1]
    arg_len=len(sys.argv)
    FIELDS=[]
    for i in range (2,arg_len):
        FIELDS.append(argv[i])
    lis=exctract_from_csv(FILE_NAME,FIELDS)
  • The CSV file and the fields required are accepted as command line inputs, as shown above using the argv function.

  • I used a random csv i got from kaggle as an example here called onlinefoods.csv.

  • So first step is to create a space for me to put all the fields i want.

  • I do this by calculating the number of fields and creating a nested list(list of lists) with equal number of lists as the number of fields to be extracted.

  • i.e. I made a list for each column and put all those lists into another list.

def exctract_from_csv(file,fields):#take fiL and list of columns required
    list_len=len(fields)#find number of columns
    lis=[]
    for i in range(list_len):
        lis.append([])#create empty lists in list to append value of each coloums one list for one column
  • Now we open the csv file in read mode and create a reader object.

  • We later read the first row into a variable called header. This row contains the name of the columns.

with open (file,mode='r') as csv_file:
        csv_reader=csv.reader(csv_file)
        header=next(csv_reader)#take first row (header)
  • We now create an empty list called array to hold the index of the required fields, create a list out of the csv_reader object and find its length.
field=[]
csv_list=list(csv_reader)
length=len(csv_list)
for i in fields: 
    field.append(header.index(i))#tkae index of required fields from header
group=list(zip(field, range(list_len)))#group together the field and the index of list its values will be entered into
  • We then append the index of the required fields from the header row we extracted earlier.

  • I then use the zip() function to get the tuples that have the field index and the index into which data in the column will be entered.

  • we then loop through each tuple in the group variable.

    • Here, we read the value of the field index in the header into x and the index it is to be inserted into as y.

    • we then iterate through the length of the list created from the csv_reader in a nested loop.

    • Inside this loop I take each value in the xth index into the list at the yth index of lis.

for i in fields: 
    field.append(header.index(i))#tkae index of required fields from header
group=list(zip(field, range(list_len)))#group together the field nad the index of list its values will be enteed into
for k in group:   
    x=k[0]
    y=k[1]
return lis
  • Now we return the list and just print it in the main function.
if __name__=="__main__":
    FILE_NAME=argv[1]
    arg_len=len(sys.argv)
    FIELDS=[]
    for i in range (2,arg_len):
        FIELDS.append(argv[i])
    lis=exctract_from_csv(FILE_NAME,FIELDS)
    for i in range(len(lis[0])):
        for j in range(len(FIELDS)): 
            print(lis[j][i],end="      ")
        print()

Conclusion

  • I do feel like i can add a few more functionality into this code(Like creating a separate csv out of the output data).

  • There are definitely easier ways to code this especially if you use numpy i believe but it you just want to extract from columns from a csv as part of a different project we can import this and use it.

  • Any and all criticism and advice is accepted.

  • github -> https://github.com/P-M-Manmohan/Python

  • original yoututbe video -> https://www.youtube.com/watch?v=vEQ8CXFWLZU&t=406s