Group Normalization divides the channels into groups and computes within each group the mean and variance for normalization. Empirically, its accuracy is more stable than batch norm in a wide range of small batch sizes, if learning rate is adjusted linearly with batch sizes.
Relation to Layer Normalization: If the number of groups is set to 1, then this operation becomes nearly identical to Layer Normalization (see Layer Normalization docs for details).
Relation to Instance Normalization:
If the number of groups is set to the input dimension (number of groups is
equal to number of channels), then this operation becomes identical to
Instance Normalization. You can achieve this via groups=-1
.
Usage
layer_group_normalization(
object,
groups = 32L,
axis = -1L,
epsilon = 0.001,
center = TRUE,
scale = TRUE,
beta_initializer = "zeros",
gamma_initializer = "ones",
beta_regularizer = NULL,
gamma_regularizer = NULL,
beta_constraint = NULL,
gamma_constraint = NULL,
...
)
Arguments
- object
Object to compose the layer with. A tensor, array, or sequential model.
- groups
Integer, the number of groups for Group Normalization. Can be in the range
[1, N]
where N is the input dimension. The input dimension must be divisible by the number of groups. Defaults to 32.- axis
Integer or List/Tuple. The axis or axes to normalize across. Typically, this is the features axis/axes. The left-out axes are typically the batch axis/axes. -1 is the last dimension in the input. Defaults to
-1
.- epsilon
Small float added to variance to avoid dividing by zero. Defaults to 1e-3.
- center
If
TRUE
, add offset ofbeta
to normalized tensor. IfFALSE
,beta
is ignored. Defaults toTRUE
.- scale
If
TRUE
, multiply bygamma
. IfFALSE
,gamma
is not used. When the next layer is linear (also e.g.relu
), this can be disabled since the scaling will be done by the next layer. Defaults toTRUE
.- beta_initializer
Initializer for the beta weight. Defaults to zeros.
- gamma_initializer
Initializer for the gamma weight. Defaults to ones.
- beta_regularizer
Optional regularizer for the beta weight.
NULL
by default.- gamma_regularizer
Optional regularizer for the gamma weight.
NULL
by default.- beta_constraint
Optional constraint for the beta weight.
NULL
by default.- gamma_constraint
Optional constraint for the gamma weight.
NULL
by default.- ...
For forward/backward compatability.
Value
The return value depends on the value provided for the first argument.
If object
is:
a
keras_model_sequential()
, then the layer is added to the sequential model (which is modified in place). To enable piping, the sequential model is also returned, invisibly.a
keras_input()
, then the output tensor from callinglayer(input)
is returned.NULL
or missing, then aLayer
instance is returned.
Input Shape
Arbitrary. Use the keyword argument
input_shape
(tuple of integers, does not include the samples
axis) when using this layer as the first layer in a model.
See also
Other normalization layers: layer_batch_normalization()
layer_layer_normalization()
layer_spectral_normalization()
layer_unit_normalization()
Other layers: Layer()
layer_activation()
layer_activation_elu()
layer_activation_leaky_relu()
layer_activation_parametric_relu()
layer_activation_relu()
layer_activation_softmax()
layer_activity_regularization()
layer_add()
layer_additive_attention()
layer_alpha_dropout()
layer_attention()
layer_average()
layer_average_pooling_1d()
layer_average_pooling_2d()
layer_average_pooling_3d()
layer_batch_normalization()
layer_bidirectional()
layer_category_encoding()
layer_center_crop()
layer_concatenate()
layer_conv_1d()
layer_conv_1d_transpose()
layer_conv_2d()
layer_conv_2d_transpose()
layer_conv_3d()
layer_conv_3d_transpose()
layer_conv_lstm_1d()
layer_conv_lstm_2d()
layer_conv_lstm_3d()
layer_cropping_1d()
layer_cropping_2d()
layer_cropping_3d()
layer_dense()
layer_depthwise_conv_1d()
layer_depthwise_conv_2d()
layer_discretization()
layer_dot()
layer_dropout()
layer_einsum_dense()
layer_embedding()
layer_feature_space()
layer_flatten()
layer_flax_module_wrapper()
layer_gaussian_dropout()
layer_gaussian_noise()
layer_global_average_pooling_1d()
layer_global_average_pooling_2d()
layer_global_average_pooling_3d()
layer_global_max_pooling_1d()
layer_global_max_pooling_2d()
layer_global_max_pooling_3d()
layer_group_query_attention()
layer_gru()
layer_hashed_crossing()
layer_hashing()
layer_identity()
layer_integer_lookup()
layer_jax_model_wrapper()
layer_lambda()
layer_layer_normalization()
layer_lstm()
layer_masking()
layer_max_pooling_1d()
layer_max_pooling_2d()
layer_max_pooling_3d()
layer_maximum()
layer_mel_spectrogram()
layer_minimum()
layer_multi_head_attention()
layer_multiply()
layer_normalization()
layer_permute()
layer_random_brightness()
layer_random_contrast()
layer_random_crop()
layer_random_flip()
layer_random_rotation()
layer_random_translation()
layer_random_zoom()
layer_repeat_vector()
layer_rescaling()
layer_reshape()
layer_resizing()
layer_rnn()
layer_separable_conv_1d()
layer_separable_conv_2d()
layer_simple_rnn()
layer_spatial_dropout_1d()
layer_spatial_dropout_2d()
layer_spatial_dropout_3d()
layer_spectral_normalization()
layer_string_lookup()
layer_subtract()
layer_text_vectorization()
layer_tfsm()
layer_time_distributed()
layer_torch_module_wrapper()
layer_unit_normalization()
layer_upsampling_1d()
layer_upsampling_2d()
layer_upsampling_3d()
layer_zero_padding_1d()
layer_zero_padding_2d()
layer_zero_padding_3d()
rnn_cell_gru()
rnn_cell_lstm()
rnn_cell_simple()
rnn_cells_stack()