In my continued series on Pandas, today I’m going to be writing about how to deal with NaN values. NaN values are entries in a dataframe that lack data; perhaps the column is non applicable, the data is messy, or the data is simply incomplete. Many models do not do well with NaN data, so dealing with these rows are a critical part of the data science process. Below are four examples of how to drop NaN values from a Pandas Dataframe.
df.dropna() #This command drops all rows that have any NaN values
df.dropna(how='all') #This command drops the rows only if ALL columns are NaN
df.dropna(thresh=2) #This command drops the row if it does not have at least two values that are not NaN
df.dropna(subset=1) #This command drops the row only if NaN is in the specific column
Depending on how exactly you want ot deal with NaN values in Pandas, each of these results could be a right answer - sometimes it’s worth dropping a row if it has a NaN value, sometimes it’s worth dropping the row if only all columns are NaN, etc. Being able to slice and dice this data in whatever way you’d like is truly a superpower of Python and Pandas, specifically.