Pandas is python Library will be used for reading/writing large tabular dataset. Perform arithmetic operations on number data and manipulate textual data. Pandas’s Dataframes are highly used with pytorch environment.
Pandas can be installed using Anaconda or python virtual environment use following commands for different environment –
- For Anacondas
- conda install pandas
- For Python virtual environment
- pip install pandas
To import pandas in python program use.
import pandas as pd
Note : I assume that pd object is created in all my examples below
Pandas series is one of the most used datatype. It is similar to Numpy array with one difference is that this series has axis labels which is treated as indexes. This can be number or string or any other python object.
- Using Lists
- To create Series using List. First create data and index list. Using those create ur pandas series. One thing to note here, data is number but index is string. I like to make is reverse but don’t get confused.
- Using Python Dictionary
- First create dictionary and use that dictionary to create pandas series.
- Using NumPy
- Creating Pandas Series using NumPy array. If Index is not defined will creating Pandas Series then automatic numeric index created starting from Zero(“0”) and incremented for each row.
While working with tabular format of data Pandas DataFrame is correct tool. Dataframe will help you cleaning and process your input data. With column and row indexing property and data can be retrieved easily. Each Dataframe object consist of multiple Pandas Series. When we recall any column information from Pandas Dataframe, its output is Pandas Series.
Each row is presented by row index of Dataframe. Each row present on Axis=0 where as each column is on Axis=1.
To create Dataframe we still use NumPy library. Please follow my NumPy examples webpage in case you need information on NumPy. Each DataFrame object need 3 types of data –
- Row id or Row no it also called as “index”
- Column name also called as “headers”
Dataframe can be created with dictionary object with key as column name. Each array shape should be of same value
If index and column is not mentioned while creating DataFrame then default column and Index starts from Zero(0)
Example shows Pandas dataframe created with index name and column names.
To get object type from the dataframe. Use dtype function. String object is considered as object type.
To get head and tail or each dataframe use head() and tail() function. To create random array I am using numpy. To view specific number of rows use integer value in function default is 5.
To get Column names are Row names(indexes) use <DataFrame>.index and <DataFrame>.columns
To get all statistics about data for your columns
To transpose your data use <DataFrame>.T. I have total 20 rows earlier that transposes to columns.
Sorting by row(index)
Sorting by Column(value). So the values of col2 will sorted in ascending order. Use “ascending=False” to make it in descending order.
To get all data for a given column use <DataFrame>[“column Name”] . To get multiple column provide list of column names.
To get specific rows of data use, Row id’s
To get Specific rows and column use following multi-axis <DataFrame>.loc function
To get specific scaler value use <DataFrame>.at function.
To get all rows where col4 value is greater than Zero. Any arithmetic conditional statements can be u
Groupby function is used as aggregation function for common columns. Multiple column can be used as list while grouping.
Two dataframe can be merge together with merge function on a given rows. If row value does not present that pd.NAN will be added in the group.
Daterange function uses period as “D” for daily , “M” for month etc. That can be used as indexes for values
To write pandas data to csv use to_csv function. If path is not specified file is saved at same location as ur notebook\python file location.
read data from csv
Pandas is vast topic. My objective to get you started. More pandas documentation can be viewed https://pandas.pydata.org/