Flash attention offers performance optimization for attention layers, making it especially useful for large language models (LLMs) that benefit from faster and more memory-efficient attention computations.
Once disabled, supported layers like MultiHeadAttention will not
use flash attention for faster computations.
See also
config_is_flash_attention_enabled() config_enable_flash_attention()
Other config: config_backend() config_disable_interactive_logging() config_disable_traceback_filtering() config_dtype_policy() config_enable_flash_attention() config_enable_interactive_logging() config_enable_traceback_filtering() config_enable_unsafe_deserialization() config_epsilon() config_floatx() config_image_data_format() config_is_interactive_logging_enabled() config_is_traceback_filtering_enabled() config_set_backend() config_set_dtype_policy() config_set_epsilon() config_set_floatx() config_set_image_data_format()