candle

Author	SHA1	Message	Date
Laurent Mazare	30cdd769f9	Update the flash attn kernels. (#2333 )	2024-07-15 20:37:36 +02:00
OlivierDehaene	8d1a57c9a0	chore: update flash attention kernels (#1518 ) * chore: update flash attention kernels * fmt * remove unused kernels * force f32 * correct stride	2024-01-05 18:28:55 +01:00
Laurent Mazare	d0cdea95a5	Add back the bf16 flash-attn kernels. (#730 )	2023-09-04 07:50:52 +01:00
Laurent Mazare	0ace420e66	Flash attention without padding (varlen). (#281 ) * Expose the seqlen variable for flash-attn without padding. * Fix the batched call. * Adapt for the varlen variant. * No need to set the batch strides when in varlen mode. * Add a test (disabled at the moment). * Get the test to work properly.	2023-07-31 09:45:39 +01:00
Laurent Mazare	2ce5f12513	Again set a few extra params in flash-attn. (#245 ) * Again set a few extra params. * Use the appropriate kernel sizes. * Add all the kernel sizes. * Parallel compiling. * Reduce the amount of parallelism. * Add the missing kernel. * Fix a typo. * Remove bf16 support for now.	2023-07-26 14:16:37 +01:00