Commit Graph

2351 Commits

Author SHA1 Message Date
Laurent Mazare a4c56a958e
Add the const-set op. (#2910)
* Add the const-set op.

* Cuda implementation.

* Bugfix.

* Metal cleanup.

* Add the metal kernels.

* Add some testing.

* Finish the metal implementation.

* Bump the version.
2025-04-19 10:07:02 +02:00
Kyle Birnbaum b2904a830b
implemented quantized-gemma3 (#2902)
* implemented quantized-gemma, inference not working

* Fixed a few modeling bugs: outputing the correct tokens for a few iterations then garbage

* lint

* clippy

* quantized-gemma3 example working

* added readme

* clippy
2025-04-19 07:46:41 +02:00
A2va 21055b5697
Add PRelu operation (#2904)
* Add PRelu operation

* Apply rustfmt.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2025-04-19 07:24:10 +02:00
Laurent Mazare 9dbaf958dc
Add an enum for scalar values. (#2909)
* Add a scalar enum type.

* Add a bit more to the scalar type.

* Small tweak.

* More scalar usage.
2025-04-18 22:13:38 +02:00
Laurent Mazare ce5f8dd129
Check the bounds in the cuda indexing kernels. (#2908)
* Check the bounds in the cuda indexing kernels.

* Another check.
2025-04-18 20:08:17 +02:00
Laurent Mazare 9954981327
Allow from_vec/from_slice to use a ShapeWithOneHole as shape. (#2905) 2025-04-17 08:59:18 +02:00
Laurent Mazare 7f0f83a7c1
Rotating kv cache positions (#2901)
* Retrieve the current positions for rotating KV caches.

* Add the function to the kv cache too.

* More testing.
2025-04-15 23:09:26 +02:00
Kyle Birnbaum 76e565c4ab
Updated candle-book: Introduction, Installation, MNIST guide, and added CONTRIBUTING.md (#2897)
* added CONTRIBUTING.md to candle-book

* added description to candle-book introduction

* Updated formatting and added different features to candle-book installation

* mnist guide first draft candle-book

* updated mnist guide syntax and grammar for candle-book

* changed HelloWorld - Mnist to Tutorial - Mnist in SUMMARY.md

* updated intro to mnist guide in candle-book
2025-04-15 21:41:10 +02:00
Laurent Mazare e4e7b0b2da
Use cudarc 0.16. (#2900)
* Use cudarc 0.16.

* Allow for disabling event tracking.

* Tweaks.

* Bump the ug version.

* And bump the candle version too.
2025-04-15 21:40:18 +02:00
Laurent Mazare b01ebbad8a
Use cudarc 0.15.2. (#2896) 2025-04-14 20:47:52 +02:00
Laurent Mazare 1d1d6d4fe6
Bump the crate version. (#2895) 2025-04-14 15:52:11 +02:00
Laurent Mazare 2653002f29
Gumbel-Softmax sampling. (#2894)
* Gumbel-Softmax sampling.

* Add a sampling test.

* Share the gumbel-softmax bits.
2025-04-14 15:42:42 +02:00
Laurent Mazare a52b76ae82
Expose the cudnn algo in the conv ops. (#2892)
* Set the algo.

* Expose the cudnn preferred algo for conv ops.
2025-04-14 08:25:32 +02:00
Laurent Mazare fb660b8d43
Add a cudnn feature to candle-nn/candle-transformers. (#2890) 2025-04-13 17:43:41 +02:00
Laurent Mazare 2f9606b187
Exclude candle-book to avoid some CI failures. (#2889)
* Exclude candle-book to avoid some CI failures.

* Remove the book CIs.
2025-04-13 17:11:41 +02:00
Laurent Mazare f3a73f80d1
Support for cudnn conv1d. (#2888)
* Support for cudnn conv1d.

* More conv1d work.

* Get the conv1d to work with cudnn.

* Cleanup.
2025-04-13 16:47:37 +02:00
Laurent Mazare b44d38de0e
Add the Orpheus TTS. (#2886)
* Add the Orpheus TTS.

* Add a small readme.

* Token fix.

* Support more voices.

* Clippy fixes.
2025-04-13 12:02:17 +02:00
Laurent Mazare d9198deb37
Im2col cuda optimization. (#2885) 2025-04-13 10:07:53 +02:00
Laurent Mazare 15ed0b11ce
Optimize the batched matmul for the cpu backend. (#2884) 2025-04-12 21:40:40 +02:00
Laurent Mazare 34505fdf3a
Avoid using batched-matmul in nn::Linear. (#2883)
* Avoid using batched-matmul in nn::Linear.

* Also avoid batched matmul in conv1d.

* Also tweak the conv2d.

* Batched tests.

* Also cover conv2d.
2025-04-12 19:53:58 +02:00
Laurent Mazare d7b7ce16e4
Upgrade ug. (#2882) 2025-04-12 13:19:32 +02:00
Laurent Mazare 19fb6dac1f
Bump the crate version. (#2881) 2025-04-11 22:28:21 +02:00
Laurent Mazare acc5bd335f
Cuda cleanup. (#2880)
* Cuda cleanup.

* More fixes.
2025-04-11 21:43:35 +02:00
Kyle Birnbaum eb478ece92
Implementing DistilBertForMaskedLM. (#2866)
* Initial commit: model weights working, prediciton incorrect

* moved distilbertformaskedlm into distilbert modeling file

* made maskedLM like bert example, still incorrect predictions

* finally not getting NaNs, fixed attention mask

* getting correct output sentences

* get top k predictions

* fixed output formatting slightly

* added default arg for model_id

* lint

* moved masked token example code from distilbertformaskedlm example to distilbert example

* lint

* removed distilbertformaskedlm example

* cleanup

* clippy

* removed embedding normalization from example

* made output and model dependent on args instead of prompt

* lint

* replaced or_ok anyhow error with anyhow context

* changed error message for mask token not found
2025-04-11 13:25:39 +02:00
Manpreet Singh d339b01726
Fix hardcoded f32 dtype for attention_mask. Use the model dtype for compatibility. (#2872) 2025-04-08 06:12:14 +02:00
Laurent Mazare 2f3bf42bcb
Support more snac variants. (#2871) 2025-04-07 08:23:47 +02:00
Laurent Mazare e3370c6316
Add the SNAC audio tokenizer. (#2869)
* Add the SNAC audio tokenizer.

* More snac.

* Again more snac.

* Add some example code for snac.

* Get the weights to load.

* Add to the snac model.

* Fixes.

* Get round-tripping to work.

* Save/load code files.

* Clippy fix.

* Fmt fix.
2025-04-06 22:15:36 +02:00
Laurent Mazare 338f6a102e
Clippy 1.86 fixes for cuda. (#2868) 2025-04-05 15:45:35 +02:00
Laurent Mazare bc33df77e1
Add the missing voices for CSM. (#2867) 2025-04-05 06:52:36 +02:00
Laurent Mazare cf9d7bf24c
Add the CSM model. (#2862)
* Add the CSM model.

* Add some code to load the model.

* Load the text tokenizer.

* Add frame generation.

* Get the sampling to work.

* Rope fix.

* Autoregressive generation.

* Generate some audio file.

* Use the actual prompt.

* Support multiple turns.

* Add a very barebone readme.

* Move some of the shared bits to the model.
2025-04-04 06:48:03 +02:00
Laurent Mazare 9d31361c4f
Fix for clippy 1.86. (#2864)
* Fix for clippy 1.86.

* More clippy fixes.

* More fixes.
2025-04-03 19:38:27 +02:00
Kyle Birnbaum 648596c073
Added readmes to examples (#2835)
* added chatGLM readme

* changed wording in readme

* added readme for chinese-clip

* added readme for convmixer

* added readme for custom ops

* added readme for efficientnet

* added readme for llama

* added readme to mnist-training

* added readme to musicgen

* added readme to quantized-phi

* added readme to starcoder2

* added readme to whisper-microphone

* added readme to yi

* added readme to yolo-v3

* added readme to whisper-microphone

* added space to example in glm4 readme

* fixed mamba example readme to run mamba instead of mamba-minimal

* removed slash escape character

* changed moondream image to yolo-v8 example image

* added procedure for making the reinforcement-learning example work with a virtual environment on my machine

* added simple one line summaries to the example readmes without

* changed non-existant image to yolo example's bike.jpg

* added backslash to sam command

* removed trailing - from siglip

* added SoX to silero-vad example readme

* replaced procedure for uv on mac with warning that uv isn't currently compatible with pyo3

* added example to falcon readme

* added --which arg to stella-en-v5 readme

* fixed image path in vgg readme

* fixed the image path in the vit readme

* Update README.md

* Update README.md

* Update README.md

---------

Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
2025-04-03 09:18:29 +02:00
Laurent Mazare d9904a3baf
Update to cudarc 0.14 (breaking change). (#2858)
* Start updating to cudarc 0.14.

* Adapt a couple more things.

* And a couple more fixes.

* More tweaks.

* And a couple more fixes.

* Bump the major version number.

* Proper module system for the cuda kernels.

* Proper ptx loading.

* Launch the sort kernel.

* Custom op.

* Start using the builder pattern.

* More builder.

* More builder.

* Get candle-core to compile.

* Get the tests to pass.

* Get candle-nn to work too.

* Support for custom cuda functions.

* cudnn fixes.

* Get flash attn to run.

* Switch the crate versions to be alpha.

* Bump the ug dependency.
2025-04-03 09:12:19 +02:00
Kyle Birnbaum d6db305829
Added new language pairs to marian-mt example. (#2860)
* added new language pairs to marian-mt

* lint

* seperated python code for converting tokenizers into its own file and and added a reqirements.txt for dependencies, updated instructions in readme and included python version

* Cleanup.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2025-04-02 23:50:14 +02:00
Zack Angelo b4daa03e59
add as_cuda_slice_mut to CudaStorage and CudaDType (#2859) 2025-04-01 19:34:52 +02:00
Bryan Lee 9541467d6b
Add `flip` to `tensor` (#2855)
* Add `flip` to `tensor`

* Move the tests to the proper places.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2025-04-01 09:07:16 +02:00
Kyle Birnbaum 6429609090
Added Deepseekr1 Llama8b variant to quantized example (#2842)
* added deepseekr1 llama8b variant to quantized example

* lint
2025-03-30 10:55:21 +02:00
Kyle Birnbaum ba473290da
Added DeepseekR1 Qwen7B variant to quantized-qwen2-instruct example (#2843)
* quantized deepseek qwen generating tokens

* removed is_deepseek from Args and replaced prompt if statement with pattern matching
2025-03-30 10:54:22 +02:00
Bryan Lee 59c26195db
Fix CIFAR10 dataset types and dimension ordering (#2845) 2025-03-30 10:53:25 +02:00
LongYinan cb02b389d5
Fix reinforcement learning example (#2837) 2025-03-26 16:27:45 +01:00
Kyle Birnbaum 0d4097031c
fixed rand import for mnist-training (#2833) 2025-03-26 08:10:03 +01:00
Kyle Birnbaum 10853b803c
fixed rand imports for whisper-microphone example (#2834) 2025-03-26 08:09:27 +01:00
xkeyC f3d472952f
fix: `candle-flash-attn` linux and `msvc` build (#2829)
* fix: candle-flash-attn linux and msvc build

* Missing newline at eof.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2025-03-25 08:45:12 +01:00
Christian Balcom 67b85f79f1
Pickle decoder fix and Long1 opcode addition. (#2824)
* Pickle decoder changes: added Long1 opcode, fixed tensor offset calculation

* Apply rustfmt.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2025-03-23 08:10:08 +01:00
Benjamin Beurdouche 0b24f7f0a4
Fix for whisper example. rand::distribution is now rand::distr (#2811) 2025-03-16 19:14:55 +01:00
Laurent Mazare 3afb04925a
Allow for growing the default KV cache when needed. (#2810) 2025-03-16 17:30:25 +01:00
André Cipriani Bandarra cbf5fc80c2
Add Gemma 3 1b IT toe Gemma examples (#2809)
- Updates the Gemma example to include Gemma 3 1b instruction tuned.
2025-03-16 17:00:48 +01:00
Laurent Mazare 468d1d525f
Bump the crate version to 0.8.4. (#2808) 2025-03-15 07:42:24 +01:00
Mike Seddon c930ab7e1a
upgrade half library to fix rand (#2806)
fix lints
2025-03-14 09:01:54 +01:00
Laurent Mazare 111edbc4ea
Gemma 3 initial setup (text only). (#2802)
* Gemma 3 initial setup (text only).

* Use the rotating kv cache for the sliding window.
2025-03-14 07:56:02 +01:00