Skip to content

Latest commit

 

History

History
88 lines (64 loc) · 2.56 KB

pandas.md

File metadata and controls

88 lines (64 loc) · 2.56 KB

Pandas is a open-source library which is built on Numpy for data manipulation by Wes McKinney. Data Frames are the key data structures in Pandas, which allows you to store and manipulate tabular data in rows of observations and columns of variables.

Why you should use Pandas?

Pandas is a very powerful library which has many features to help data scientists in data manipulation and analysis. some of the key features are as below:

  • Handles missing data and data slicing efficiently
  • Uses Series for 1D data structure and DataFrames for multi-dimensional data structures
  • Offers flexibility to merge, concatenate or manipulate the data
  • Pandas is one of the best solutions to deal with time series data.

What is a DataFrame

A DataFrame is a two-dimensional data structure, means the data is aligned into rows and columns. DataFrames are the standard way to store the data. They are size-mutable, potentially heterogeneous tabular data.

How to create DataFrame

There are multiple ways to create DataFrames.

Using Dictionaties

import pandas as pd
dict1 = {"country": ["USA", "Mexico", "India", "Australia","China", "Indonesia"],
       "language": ["English", "spanish", "Hindi", "English", "Chinese", "Indonesian"]}

df = pd.DataFrame(dict)
print(df)

Gives results as below:

     country    language
0        USA     English
1     Mexico     spanish
2      India       Hindi
3  Australia     English
4      China     Chinese
5  Indonesia  Indonesian

Using Lists

import pandas as pd
list1 = [1,2,3,4,5,6,7,8,9,10]
df = pd.DataFrame(list1)
print df

import from csv files

You can also import csv files to create DataFrames. Consider you have example.csv stored and can be imported using Pandas using pd.read_csv().

import pandas as pd

data = pd.read_csv('example.csv') # reads example.csv csv file

print(data)

import from excel files

You can also import csv files to create DataFrames. Consider you have example.csv stored and can be imported using Pandas using pd.read_csv().

import pandas as pd

data = pd.read_excel('example.xlsx') # reads example.xlsx xlsx file

print(data)

dropping a column

you can drop a column using drop() method

import pandas as pd
dict1 = {"country": ["USA", "Mexico", "India", "Australia","China", "Indonesia"],
       "language": ["English", "spanish", "Hindi", "English", "Chinese", "Indonesian"]}

df = pd.DataFrame(dict)
df.drop("country",axis=1)

exporting a dataframe to csv

you can export a dataframe to csv file using to_csv() method

df.to_csv("output.csv")