Pandas attributes and methods:
df[col].unique()
-> return a list of unique values in the seriesdf[col].nunique()
-> return the number of unique values in the seriesdf.isnull().sum()
-> return the number of null values in the dataframe
Matplotlib and seaborn methods:
%matplotlib inline
-> assure that plots are displayed in jupyter notebook's cellssns.histplot()
-> show the histogram of a series
Numpy methods:
np.log1p()
-> apply log transformation to a variable, after adding one to each input value.
Long-tail distributions usually confuse the ML models, so the recommendation is to transform the target variable distribution to a normal one whenever possible.
The entire code of this project is available in this jupyter notebook.
The notes are written by the community. If you see an error here, please create a PR with a fix. |