The Gaussian error linear unit (GELU) is defined as:
gelu(x) = x * P(X <= x) where P(X) ~ N(0, 1),
i.e. gelu(x) = 0.5 * x * (1 + erf(x / sqrt(2))).
GELU weights inputs by their value, rather than gating inputs by their sign as in ReLU.
See also
Other activations: activation_celu() activation_elu() activation_exponential() activation_glu() activation_hard_shrink() activation_hard_sigmoid() activation_hard_tanh() activation_leaky_relu() activation_linear() activation_log_sigmoid() activation_log_softmax() activation_mish() activation_relu() activation_relu6() activation_selu() activation_sigmoid() activation_silu() activation_soft_shrink() activation_softmax() activation_softplus() activation_softsign() activation_sparse_plus() activation_sparsemax() activation_squareplus() activation_tanh() activation_tanh_shrink() activation_threshold()