Flash attention offers performance optimization for attention layers, making it especially useful for large language models (LLMs) that benefit from faster and more memory-efficient attention computations.
Once disabled, supported layers like MultiHeadAttention
will not
use flash attention for faster computations.
See also
config_is_flash_attention_enabled()
config_enable_flash_attention()
Other config: config_backend()
config_disable_interactive_logging()
config_disable_traceback_filtering()
config_dtype_policy()
config_enable_flash_attention()
config_enable_interactive_logging()
config_enable_traceback_filtering()
config_enable_unsafe_deserialization()
config_epsilon()
config_floatx()
config_image_data_format()
config_is_interactive_logging_enabled()
config_is_traceback_filtering_enabled()
config_set_backend()
config_set_dtype_policy()
config_set_epsilon()
config_set_floatx()
config_set_image_data_format()