The Gaussian error linear unit (GELU) is defined as:

`gelu(x) = x * P(X <= x)`

where `P(X) ~ N(0, 1)`

,
i.e. `gelu(x) = 0.5 * x * (1 + erf(x / sqrt(2)))`

.

GELU weights inputs by their value, rather than gating inputs by their sign as in ReLU.

