Skip to contents

Flash attention offers performance optimization for attention layers, making it especially useful for large language models (LLMs) that benefit from faster and more memory-efficient attention computations.

Once disabled, supported layers like MultiHeadAttention will not use flash attention for faster computations.

Usage

config_disable_flash_attention()