Commit Graph

5 Commits

Author SHA1 Message Date
Laurent Mazare 30cdd769f9
Update the flash attn kernels. (#2333) 2024-07-15 20:37:36 +02:00
OlivierDehaene 8d1a57c9a0
chore: update flash attention kernels (#1518)
* chore: update flash attention kernels

* fmt

* remove unused kernels

* force f32

* correct stride
2024-01-05 18:28:55 +01:00
Laurent Mazare d0cdea95a5
Add back the bf16 flash-attn kernels. (#730) 2023-09-04 07:50:52 +01:00
Laurent Mazare 0ace420e66
Flash attention without padding (varlen). (#281)
* Expose the seqlen variable for flash-attn without padding.

* Fix the batched call.

* Adapt for the varlen variant.

* No need to set the batch strides when in varlen mode.

* Add a test (disabled at the moment).

* Get the test to work properly.
2023-07-31 09:45:39 +01:00
Laurent Mazare 2ce5f12513
Again set a few extra params in flash-attn. (#245)
* Again set a few extra params.

* Use the appropriate kernel sizes.

* Add all the kernel sizes.

* Parallel compiling.

* Reduce the amount of parallelism.

* Add the missing kernel.

* Fix a typo.

* Remove bf16 support for now.
2023-07-26 14:16:37 +01:00