Categories
Machine Learning Pandas Python Python ML

Pandas Example’s

Pandas is python Library will be used for reading/writing large tabular dataset. Perform arithmetic operations on number data and manipulate textual data. Pandas’s Dataframes are highly used with pytorch environment.

Pandas Installation

Pandas can be installed using Anaconda or python virtual environment use following commands for different environment –

  • For Anacondas
    • conda install pandas
  • For Python virtual environment
    • pip install pandas

To import pandas in python program use.

import pandas as pd

Note : I assume that pd object is created in all my examples below

Pandas Series

Pandas series is one of the most used datatype. It is similar to Numpy array with one difference is that this series has axis labels which is treated as indexes. This can be number or string or any other python object.

Creating Series

  • Using Lists
    • To create Series using List. First create data and index list. Using those create ur pandas series. One thing to note here, data is number but index is string. I like to make is reverse but don’t get confused.
  • Using Python Dictionary
    • First create dictionary and use that dictionary to create pandas series.
  • Using NumPy
    • Creating Pandas Series using NumPy array. If Index is not defined will creating Pandas Series then automatic numeric index created starting from Zero(“0”) and incremented for each row.

Pandas Dataframe

While working with tabular format of data Pandas DataFrame is correct tool. Dataframe will help you cleaning and process your input data. With column and row indexing property and data can be retrieved easily. Each Dataframe object consist of multiple Pandas Series. When we recall any column information from Pandas Dataframe, its output is Pandas Series.

Each row is presented by row index of Dataframe. Each row present on Axis=0 where as each column is on Axis=1.

To create Dataframe we still use NumPy library. Please follow my NumPy examples webpage in case you need information on NumPy. Each DataFrame object need 3 types of data –

  • Data
  • Row id or Row no it also called as “index”
  • Column name also called as “headers”

Dataframe can be created with dictionary object with key as column name. Each array shape should be of same value

If index and column is not mentioned while creating DataFrame then default column and Index starts from Zero(0)

Example shows Pandas dataframe created with index name and column names.

To get object type from the dataframe. Use dtype function. String object is considered as object type.

To get head and tail or each dataframe use head() and tail() function. To create random array I am using numpy. To view specific number of rows use integer value in function default is 5.

To get Column names are Row names(indexes) use <DataFrame>.index and <DataFrame>.columns

To get all statistics about data for your columns

To transpose your data use <DataFrame>.T. I have total 20 rows earlier that transposes to columns.

Sorting

Sorting by row(index)

Sorting by Column(value). So the values of col2 will sorted in ascending order. Use “ascending=False” to make it in descending order.

Selecting Data

To get all data for a given column use <DataFrame>[“column Name”] . To get multiple column provide list of column names.

List of columns

To get specific rows of data use, Row id’s

To get Specific rows and column use following multi-axis <DataFrame>.loc function

To get specific scaler value use <DataFrame>.at function.

Conditional Selection

To get all rows where col4 value is greater than Zero. Any arithmetic conditional statements can be u

Groupby

Groupby function is used as aggregation function for common columns. Multiple column can be used as list while grouping.

Merge

Two dataframe can be merge together with merge function on a given rows. If row value does not present that pd.NAN will be added in the group.

Daterange

Daterange function uses period as “D” for daily , “M” for month etc. That can be used as indexes for values

Exporting Data

To CSV

To write pandas data to csv use to_csv function. If path is not specified file is saved at same location as ur notebook\python file location.

read data from csv

Conclusion

Pandas is vast topic. My objective to get you started. More pandas documentation can be viewed https://pandas.pydata.org/

Categories
Python ML

NumPy Example’s

NumPy is package for scientific computing. It provides library for multidimensional array. This library can perform many mathematical operation on large set of dataset. Also helpful for sorting large dataset and performing IO operations. This python library can be using for random simulation data generator.

Main NumPy object is “ndarray”. This object can be single dimensional to multi-dimensional array for same data-type. “ndarray” can be relaed to python List but its working is far different. “ndarray” is fixed size object in case you wanted to increase or decrease size or shape of ndarray, NumPy will create new object and delete old object.

NumPy Use cases

  • Importing large dataset
  • Performing mathematical computation over large dataset
  • Perform efficient way of sorting
  • Random data generator for AI workflow

NumPy Installation

You can Anaconda or Python environment to install Numpy. Its very simple installation for NumPy. Just enter following command

pip install numpy

To check numpy installed successfully. Try importing module in your python code from python cli

>>> import numpy

NumPy Basic Array

Simgle dimentional array is created using following command. First import numpy as np(generally imported as np but can use other name. Then create your array.

np.zeros() and np.ones()

To create array with all zeros or ones use following command. Just provide length of array. I am using Jupyter notebook for simplicity but same commands can be run from python CLI or python script.

By default zeros() and ones() function creating array with float type but one can use following function to create integer. Convert array with argument dtype=np.int64

np.arange()

Create sequential array using “arrange” function. This will create integer starting from 0(zero) till incremented by 1(one).

Arange function can be used with step element. if want to create array with from 1000(including) till 10000(excluding). We can use following command. First argument is low number it was included, second no is high number which is excluding and step function how many steps we are jumping. To get even no, stepping 2.

np.linspace()

use np.linspace() to create an array with values that are spaced linearly in a specified interval. This function will create equal space between first arg and second arg in regular interval. First and last no are including in interval.

np.random.rand()

Create random array between Zero(0) and One(1). This will create float random array.

np.random.randint()

Create random array for integer value. Low number is including and high number is excluding. Size is array size. It can be single dimentional or multi dimentational array.

NumPy N-dimensional Array

Create two dimensional array using numpy.

Create N-dimentional integer array –

np.reshape()

Reshape commands will change array shape. Please ensure that array should have same numbers of elements with target shape. For example. If we need to create 4×3 array source array should have at exactly 12 element. For 4×4 target array source need to have 16 element.

Getting shape and size of Array

Below functions are attribute of array and not function.

ndarray.size

To get total no of element in the array :

ndarray.shape

to get current shape for an array. This parameter come very handy when array or matrix multiplication is performed.

ndarray.ndim

to get current array dimension

Sorting and Joining Arrays

np.sort()

np.sort() is simple way to sort any NumPy array.

np.concatenate()

np.concatenate used to concatenate array. You can concatenate only same dimensional or shape array –

2-D addition concatenation

np.append()

To add element at the end of array

np.delete()

To delete element from array –

np.flip()

To reverse array use np.flip()

Indexing and slicing

NumPy arrays are working same as python list. Please find bellow examples. Indexes starts from Zero.

Stored sliced array into new array to use sliced array –

To update value in element use indexes for that –

N-D Indexing

Indexing for multidimensional works same as 1D. Just use tuple each dimension. For Example, for 2D use arr_name[3,4] to get 4th row and 5th element.

Normally, if not specified output from column is single array. Use reshape command to get output properly stored on respective row/column format.

Arithmetic Manipulation of array Elements

Main usecase for NumPy is arithmetic manupulation. NumPy makes this very easy.

In below example, we are adding 5 to each element in the array. This operation is performed to all element in array. This does not make change to existing array if you want to make change perform assignment. Again, as I mentioned earlier, this will assign new memory.

Boolean Operation

NumPy make it easier for boolean operation. Boolean operation will create array of all element which satisfy operation. In this case, b[b>5] will sent list of all elements in a array.

np.nonzero()

To get list of all indices where condition is satisfied use np.nonzero()

Arithmetic Manipulation of Array

To add and subtract array only works on same shape of array. If shape is not same it will not work.

As long as columns are matching newly added matrix will be added to all rows.

Matrix axis definition

Each array has different axis. In two dimensional array –

  • For Column : axis=0
  • For Rows : axis=1

As per below example if you select axis=0 that means it will return value of maximum value from appropriate rows and columns.

Conclusion

NumPy is very good library for large number of dataset and can be useful tool for data manipulation with minimum code. We can perform same task from python but with NumPy library its easy.

Enjoy !! Keep Learning !!!