The Gaussian error linear unit (GELU) is defined as:
gelu(x) = x * P(X <= x)
where P(X) ~ N(0, 1)
,
i.e. gelu(x) = 0.5 * x * (1 + erf(x / sqrt(2)))
.
GELU weights inputs by their value, rather than gating inputs by their sign as in ReLU.
See also
Other activations: activation_elu()
activation_exponential()
activation_hard_sigmoid()
activation_leaky_relu()
activation_linear()
activation_log_softmax()
activation_mish()
activation_relu()
activation_relu6()
activation_selu()
activation_sigmoid()
activation_silu()
activation_softmax()
activation_softplus()
activation_softsign()
activation_tanh()