Correcting Common Errors in Probabilistic Evaluations: Efficacy of Debiasing