For each batch i, and class j,
sparsemax activation function is defined as:
sparsemax(x)[i, j] = max(x[i, j] - (x[i, :]), 0).
See also
Other activations: activation_celu() activation_elu() activation_exponential() activation_gelu() activation_glu() activation_hard_shrink() activation_hard_sigmoid() activation_hard_tanh() activation_leaky_relu() activation_linear() activation_log_sigmoid() activation_log_softmax() activation_mish() activation_relu() activation_relu6() activation_selu() activation_sigmoid() activation_silu() activation_soft_shrink() activation_softmax() activation_softplus() activation_softsign() activation_sparse_plus() activation_squareplus() activation_tanh() activation_tanh_shrink() activation_threshold()