Gabriel Cypriano

1261 days ago

If we perform an EDA on the whole dataset (including the test dataset) and use that info to come up with some feature engineering than we could also suffer from data leakage.

There are many ways of imputing the values, mean, or median. It is up to you how to do it but make sure to calculate the imputation statistics only on the training data to avoid data leakage of your test set.

Common mistakes when carrying out machine learning and data science

kdnuggets.com