Data Analysis Notes

Cursory Data Analysis with Pandas

How to get a brief view on what data poses?

Method	Description
describe()	provides various statistical information for each column (count, mean, std, etc.)
head()	returns first (five by default) rows of DataFrame
info()	returns summary of DataFrame such as data types, memory consumption and so on
count()	returns series with number of non-NA/null values for all columns
df[‘column’].value_counts()	returns counts of unique values in a column
df[‘column’].nunique()	returns number of unique values in a column
pandas.tools.plotting.scatter_matrix()	draws scatter plots for given data frame
df[‘column’].hist()	draws histogram of the column values using matplotlib
scipy.stats.probplot(array, plot=plt)	draws probability plot to check that the data set follows a normal distribution
statsmodels.graphics.gofplots.qqplot(array, line=’s’)	draws a QQ-plot
scipy.stats.shapiro(array)[1]	returns p-value of the Shapiro-Wilk test for normality