Use pandas

How to use pandas read_csv

How to use pandas read_csv

If you're looking to perform analysis on .csv data with pandas, you will first have to get the information into pandas. The most common way of getting .csv data into a pandas dataframe is by using the pandas read_csv() function.

Let's look at some data that we'd like to pass into pandas. We have the following .csv homes_sorted.csv and we'd like to do some analysis on it.

homes_sorted.csv

Here is what the original .csv file looks like

AddressPriceBedrooms
992 Settled St823,0494
1506 Guido St784,0493
247 Fort St299,2383
132 Walrus Ave299,0012
491 Python St293,9234
4981 Anytown Rd199,0004
938 Zeal Rd148,3982
123 Main St99,0001

So how exactly can we read in this .csv?

Read .csv using pandas read_csv

Copy .csv to clipboard here: homes_sorted.csv

# This is a convention. Most people import pandas as pd.
import pandas as pd
# You can pass a filepath or a file-like object into the read_csv function.
# Here, we are passing in a filepath. the read_csv function returns a DataFrame
df = pd.read_csv('/Users/kennethcassel/homes_sorted.csv')
# Our csv has been loaded into a DataFrame called df. We can perform all types of
# useful operations on it. For now, we will just return the entire dataframe
df

df

Wait a minute. Our data has been changed. Scroll back up to the .csv we passed in and see if you notice the difference.

AddressPriceBedrooms
0992 Settled St823,0494
11506 Guido St784,0493
2247 Fort St299,2383
3132 Walrus Ave299,0012
4491 Python St293,9234
54981 Anytown Rd199,0004
6938 Zeal Rd148,3982
7123 Main St99,0001

Why did pandas add a new number column to our data?

Pandas DataFrames and Series always have an index. This is a number that displays next to the columns.

When you export your .csv you can pass in the flag Index=False to prevent this index from showing up in your data.

Take a look at some of our other .csv file pandas recipes to learn more!

Edit this page on GitHub