Ben Chuanlong Du's Blog

It is never too late to learn.

Effect of Duplicating Observations in Linear Models

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

coefficients don't change but variance become smaller. use formula to show it ...

Complete Duplication of All Data Points

Complete Duplication of Some Data Points

Duplication with Noise

common in computer vision …

Spark Issue: Duplicated Partitions

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

There seems to be an issue in Spark that it might fail to overwrite files even if mode of spark.write is set to be "overwrite".