curse of dimentionality
Table of Contents
1. Description
As the number of dimensions (features) increases, the volume of the space grows exponentially, which means that data becomes increasingly sparse. This sparsity makes it difficult to obtain reliable estimates and requires exponentially more data to achieve the same level of accuracy as in lower dimensions.
This is very evident when using algorithms like Nearest neighbour averaging. These algorithms depend on local neighbourhood estimates. With increased number of dimensions the volume of space to consider to capture the same percentage of data increases eponentially. In higher dimensions the space is no more local, but requires us to consider a large fraction of the data for estimation which does not capture local variability.
2. How to avoid?
This can be avoided by introducing structure in modeling. The structure makes sure that the modeling does not depend on local neighbourhood. An example will be, Linear regression.