Saturday, 27 April 2019

Comparing Two Columns of Excel and write the result to new excel

It is very common in IT world where we have to compare to compare the two columns of excel and then present a report based on the outcome. In order to perform this activity, we rely on the functionality of Excel to perform this function. However, it is very difficult to use excel when the data is huge or if we have data in any other format(.csv).  In this post, I will be using Python's pandas library to compare the data of two different columns of the same excel. Additionally, we can use this to compare the data of two columns in two different excels as well.

At first, I will be having below data in Book1.xlsx

Here in Book1.xlsx, I have two columns with Note1 and Note2. Both the columns have data in character format. Now, I will be comparing the data in Note1 with Note2, if the data in a single row matches with each other then I will be exporting the whole row into a new excel.

Next, I will be importing the pandas library, reading the data from excel and printing the columns in the excel:-


Now, compare the two columns of the excel and appending the result to a list and then into a dictionary.



After comparing the two columns and taking the matched records into one dictionary, now create a dataframe through this dictionary and then export this dataframe to an excel.



Finally, an excel with the name output.xlsx will be created at the location specified and below is the result in the output.xlsx.




Please do share your thoughts about this blog.

Thank you. Happy learning.

Sunday, 21 April 2019

Web Scraping In Python


Web scraping means you can fetch URLs, email addresses, phone numbers, names and other text-like data from a webpage.

Python provides helpful libraries to read and extract the data from webpages. Let us delve deeper into the concept of web scraping using Python.

Libraries used:-


  • urllib --> to call the particular url and extract the data.
  • re(Regular Expression) --> to clean the data
  • pandas --> convert the extracted data into the dataframe
import urllib.request
import re
import pandas as pd

url = "<URL>"

response = urllib.request.urlopen(url)
html = response.read()
htmlStr = html.decode()

#extract all the phone numbers from the webpage
# we are using re.findall function to extract the data. The O/P will be a list.
phdata = re.findall("\(\d{3}\) \d{3}-\d{4}", htmlStr)
print(phdata)



regex = re.compile("\n")
htmlStr1 = regex.sub("",htmlStr)

for name in re.findall("<li>\w{2,20} \w{2,20}<br/>",htmlStr1):
    print(name)


#extractall the names from the webpage. The o/p will be a list
name1 = re.findall("<li>\w{2,20} \w{2,20}<br/>",htmlStr1)



#Cleaning the data
for i in range(len(name1)):
    #print(name1[i])
    name1[i] = name1[i].replace("<li>","")
    name1[i] = name1[i].replace("<br/>","")

name1
 

# adding the extracted data to a dictionary

phDict = {}
x = 0
for name in name1:
    phDict[name] = phdata[x]
    x+=1
print(phDict)


#Creating a dataframe through a dictionary

phone_df = pd.DataFrame(list(phDict.items()), index=range(len(phDict)))

phone_df.info()

phone_df



# Renaming the columns of dataframe
phone_df=phone_df.rename(index=str,columns={0:"Name",1:"Phone"})

phone_df



# extracting a perticular person's phone number
phone_df[phone_df['Name']=='Tamara Howe']



Sending email using Python

Python has provided smtplib library in order to send email. import smtplib #domain name for the smtp server and port number conn = smtplib.S...