Script to extract fields from CSV
Introduction
- I was looking for more projects to do for python scripting and that's when i saw video by 'Internet made Coder' on YouTube that explained 5 good scripting projects with explanations. So, today I'm making one out of 5 from there.
Libraries
For this we will need to import a few libraries
csv
sys
argv from sys
import csv
import sys
from sys import argv
Code
This is basically this program will extract the data in particular fields from a csv.
I created a function to do this, it takes the file name/path and the fields to be extracted as the parameters.
if __name__=="__main__":
FILE_NAME=argv[1]
arg_len=len(sys.argv)
FIELDS=[]
for i in range (2,arg_len):
FIELDS.append(argv[i])
lis=exctract_from_csv(FILE_NAME,FIELDS)
The CSV file and the fields required are accepted as command line inputs, as shown above using the argv function.
I used a random csv i got from kaggle as an example here called onlinefoods.csv.
So first step is to create a space for me to put all the fields i want.
I do this by calculating the number of fields and creating a nested list(list of lists) with equal number of lists as the number of fields to be extracted.
i.e. I made a list for each column and put all those lists into another list.
def exctract_from_csv(file,fields):#take fiL and list of columns required
list_len=len(fields)#find number of columns
lis=[]
for i in range(list_len):
lis.append([])#create empty lists in list to append value of each coloums one list for one column
Now we open the csv file in read mode and create a reader object.
We later read the first row into a variable called header. This row contains the name of the columns.
with open (file,mode='r') as csv_file:
csv_reader=csv.reader(csv_file)
header=next(csv_reader)#take first row (header)
- We now create an empty list called array to hold the index of the required fields, create a list out of the csv_reader object and find its length.
field=[]
csv_list=list(csv_reader)
length=len(csv_list)
for i in fields:
field.append(header.index(i))#tkae index of required fields from header
group=list(zip(field, range(list_len)))#group together the field and the index of list its values will be entered into
We then append the index of the required fields from the header row we extracted earlier.
I then use the
zip()
function to get the tuples that have the field index and the index into which data in the column will be entered.we then loop through each tuple in the group variable.
Here, we read the value of the field index in the header into x and the index it is to be inserted into as y.
we then iterate through the length of the list created from the csv_reader in a nested loop.
Inside this loop I take each value in the xth index into the list at the yth index of lis.
for i in fields:
field.append(header.index(i))#tkae index of required fields from header
group=list(zip(field, range(list_len)))#group together the field nad the index of list its values will be enteed into
for k in group:
x=k[0]
y=k[1]
return lis
- Now we return the list and just print it in the main function.
if __name__=="__main__":
FILE_NAME=argv[1]
arg_len=len(sys.argv)
FIELDS=[]
for i in range (2,arg_len):
FIELDS.append(argv[i])
lis=exctract_from_csv(FILE_NAME,FIELDS)
for i in range(len(lis[0])):
for j in range(len(FIELDS)):
print(lis[j][i],end=" ")
print()
Conclusion
I do feel like i can add a few more functionality into this code(Like creating a separate csv out of the output data).
There are definitely easier ways to code this especially if you use numpy i believe but it you just want to extract from columns from a csv as part of a different project we can import this and use it.
Any and all criticism and advice is accepted.
github -> https://github.com/P-M-Manmohan/Python
original yoututbe video -> https://www.youtube.com/watch?v=vEQ8CXFWLZU&t=406s