Cursory Data Analysis with Pandas
How to get a brief view on what data poses?
Method | Description |
---|---|
describe() | provides various statistical information for each column (count, mean, std, etc.) |
head() | returns first (five by default) rows of DataFrame |
info() | returns summary of DataFrame such as data types, memory consumption and so on |
count() | returns series with number of non-NA/null values for all columns |
df[‘column’].value_counts() | returns counts of unique values in a column |
df[‘column’].nunique() | returns number of unique values in a column |
pandas.tools.plotting.scatter_matrix() | draws scatter plots for given data frame |
df[‘column’].hist() | draws histogram of the column values using matplotlib |
scipy.stats.probplot(array, plot=plt) | draws probability plot to check that the data set follows a normal distribution |
statsmodels.graphics.gofplots.qqplot(array, line=’s’) | draws a QQ-plot |
scipy.stats.shapiro(array)[1] | returns p-value of the Shapiro-Wilk test for normality |