Documentation Pass for Models (#2617)
* links in chinese_clip * links for clip model * add mod docs for flux and llava * module doc for MMDIT and MIMI * add docs for a few more modesl * mod docs for bert naser and beit * add module docs for convmixer colpali codegeex and chatglm * add another series of moddocs * add fastvit-llama2_c * module docs mamba -> mobileone * module docs from moondream-phi3 * mod docs for quantized and qwen * update to yi * fix long names * Update llama2_c.rs * Update llama2_c_weights.rs * Fix the link for mimi + tweaks --------- Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
This commit is contained in:
parent
0ed24b9852
commit
f689ce5d39
|
@ -1,10 +1,9 @@
|
|||
//! Based from the Stanford Hazy Research group.
|
||||
//!
|
||||
//! See "Simple linear attention language models balance the recall-throughput tradeoff", Arora et al. 2024
|
||||
//! <https://arxiv.org/abs/2402.18668>
|
||||
|
||||
//! Original code:
|
||||
//! https://github.com/HazyResearch/based
|
||||
//! - [Arxiv](https://arxiv.org/abs/2402.18668)
|
||||
//! - [Github](https://github.com/HazyResearch/based)
|
||||
//!
|
||||
|
||||
use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
|
||||
use candle_nn::{
|
||||
|
|
|
@ -1,3 +1,10 @@
|
|||
//! Based on the BEIT vision-language model.
|
||||
//!
|
||||
//! See "BEIT: BERT Pre-Training of Image Transformers", Bao et al. 2021
|
||||
//! - [Arxiv](https://arxiv.org/abs/2106.08254)
|
||||
//! - [Github](https://github.com/microsoft/unilm/tree/master/beit)
|
||||
//!
|
||||
|
||||
use candle::{DType, Device, IndexOp, Result, Tensor, D};
|
||||
use candle_nn::{layer_norm, LayerNorm, Linear, Module, VarBuilder};
|
||||
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! BERT (Bidirectional Encoder Representations from Transformers)
|
||||
//!
|
||||
//! See "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", Devlin et al. 2018
|
||||
//! - [Arxiv](https://arxiv.org/abs/1810.04805)
|
||||
//! - [Github](https://github.com/google-research/bert)
|
||||
//!
|
||||
use super::with_tracing::{layer_norm, linear, LayerNorm, Linear};
|
||||
use candle::{DType, Device, Result, Tensor};
|
||||
use candle_nn::{embedding, Embedding, Module, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,10 @@
|
|||
//! BigCode implementation in Rust based on the GPT-BigCode model.
|
||||
//!
|
||||
//! See "StarCoder: A State-of-the-Art LLM for Code", Mukherjee et al. 2023
|
||||
//! - [Arxiv](https://arxiv.org/abs/2305.06161)
|
||||
//! - [Github](https://github.com/bigcode-project/starcoder)
|
||||
//!
|
||||
|
||||
use candle::{DType, Device, IndexOp, Result, Tensor, D};
|
||||
use candle_nn::{embedding, linear_b as linear, Embedding, LayerNorm, Linear, Module, VarBuilder};
|
||||
|
||||
|
|
|
@ -1,3 +1,10 @@
|
|||
//! Based on the BLIP paper from Salesforce Research.
|
||||
//!
|
||||
//! See "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation"
|
||||
//! - [Arxiv](https://arxiv.org/abs/2201.12086)
|
||||
//! - [Github](https://github.com/salesforce/BLIP)
|
||||
//!
|
||||
|
||||
use super::blip_text;
|
||||
use super::with_tracing::{conv2d, linear, Conv2d, Linear};
|
||||
use candle::{Module, Result, Tensor, D};
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! Implementation of BLIP text encoder/decoder.
|
||||
//!
|
||||
//! See "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation"
|
||||
//! https://arxiv.org/abs/2201.12086
|
||||
//!
|
||||
|
||||
use super::with_tracing::{linear, Embedding, Linear};
|
||||
use candle::{Module, Result, Tensor, D};
|
||||
use candle_nn::{layer_norm, LayerNorm, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,10 @@
|
|||
//! Implementation of the ChatGLM2/3 models from THUDM.
|
||||
//!
|
||||
//! See:
|
||||
//! - ChatGLM3: ["ChatGLM3: Advancing Multilingual Conversational Language Models with High-Quality Data"](https://github.com/THUDM/ChatGLM3)
|
||||
//! - ChatGLM2: ["ChatGLM2: An Open Bilingual Chat LLM"](https://github.com/THUDM/ChatGLM2-6B)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::{linear_b as linear, Linear};
|
||||
use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
|
||||
use candle_nn::VarBuilder;
|
||||
|
|
|
@ -3,8 +3,9 @@
|
|||
//! Chinese contrastive Language-Image Pre-Training (CLIP) is an architecture trained on
|
||||
//! pairs of images with related texts.
|
||||
//!
|
||||
//! https://github.com/OFA-Sys/Chinese-CLIP
|
||||
//! https://github.com/huggingface/transformers/blob/5af7d41e49bbfc8319f462eb45253dcb3863dfb7/src/transformers/models/chinese_clip/modeling_chinese_clip.py
|
||||
//! - [GH Link](https://github.com/OFA-Sys/Chinese-CLIP)
|
||||
//! - Transformers Python [reference implementation](https://github.com/huggingface/transformers/blob/5af7d41e49bbfc8319f462eb45253dcb3863dfb7/src/transformers/models/chinese_clip/modeling_chinese_clip.py)
|
||||
//!
|
||||
|
||||
use candle::{Module, Result, Tensor, D};
|
||||
use candle_nn as nn;
|
||||
|
|
|
@ -3,8 +3,9 @@
|
|||
//! Contrastive Language-Image Pre-Training (CLIP) is an architecture trained on
|
||||
//! pairs of images with related texts.
|
||||
//!
|
||||
//! https://github.com/openai/CLIP
|
||||
//! https://github.com/huggingface/transformers/tree/f6fa0f0bf0796ac66f201f23bdb8585de1609add/src/transformers/models/clip
|
||||
//! - [GH Link](https://github.com/openai/CLIP)
|
||||
//! - Transformers Python [reference implementation](https://github.com/huggingface/transformers/tree/f6fa0f0bf0796ac66f201f23bdb8585de1609add/src/transformers/models/clip)
|
||||
|
||||
use self::{
|
||||
text_model::{Activation, ClipTextTransformer},
|
||||
vision_model::ClipVisionTransformer,
|
||||
|
|
|
@ -1,3 +1,10 @@
|
|||
//! CodeGeeX4 - A multi-language code generation model
|
||||
//!
|
||||
//! See "CodeGeeX: A Pre-Trained Model For Code Generation with Multilingual Evaluations on HumanEval-X", Qian et al. 2023
|
||||
//! - [Arxiv](https://arxiv.org/abs/2303.17568)
|
||||
//! - [Github](https://github.com/THUDM/CodeGeeX)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::{linear_b as linear, Linear};
|
||||
use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
|
||||
use candle_nn::VarBuilder;
|
||||
|
|
|
@ -1,3 +1,8 @@
|
|||
//! Colpali Model for text/image similarity scoring.
|
||||
//!
|
||||
//! Colpali combines a vision encoder with an efficient LM for retrieving content.
|
||||
//!
|
||||
|
||||
use candle::{Module, Result, Tensor};
|
||||
use candle_nn::VarBuilder;
|
||||
|
||||
|
|
|
@ -1,3 +1,10 @@
|
|||
//! ConvMixer implementation.
|
||||
//!
|
||||
//! See "Patches Are All You Need?" by Trockman et al. 2022
|
||||
//! - [Arxiv](https://arxiv.org/abs/2201.09792)
|
||||
//! - [Github](https://github.com/locuslab/convmixer)
|
||||
//!
|
||||
|
||||
use candle::Result;
|
||||
use candle_nn::{batch_norm, Conv2dConfig, Module, VarBuilder};
|
||||
|
||||
|
|
|
@ -1,15 +1,13 @@
|
|||
//! ConvNeXt implementation.
|
||||
//!
|
||||
//! See "A ConvNet for the 2020s" Liu et al. 2022
|
||||
//! <https://arxiv.org/abs/2201.03545>
|
||||
//! See ["A ConvNet for the 2020s" Liu et al. 2022](https://arxiv.org/abs/2201.03545)
|
||||
//! and
|
||||
//! "ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders" Woo et al. 2023
|
||||
//! <https://arxiv.org/abs/2301.00808>
|
||||
|
||||
//! ["ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders" Woo et al. 2023](https://arxiv.org/abs/2301.00808)
|
||||
//!
|
||||
//! Original code:
|
||||
//! https://github.com/facebookresearch/ConvNeXt/
|
||||
//! https://github.com/facebookresearch/ConvNeXt-V2/
|
||||
//! timm: https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/convnext.py
|
||||
//! - [ConvNeXt](https://github.com/facebookresearch/ConvNeXt/)
|
||||
//! - [ConvNeXt-V2](https://github.com/facebookresearch/ConvNeXt-V2/)
|
||||
//! - [timm](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/convnext.py)
|
||||
|
||||
use candle::shape::ShapeWithOneHole;
|
||||
use candle::{Result, D};
|
||||
|
|
|
@ -1,4 +1,9 @@
|
|||
/// Adapted from https://github.com/descriptinc/descript-audio-codec
|
||||
//! Implementation of the Descript Audio Codec (DAC) model
|
||||
//!
|
||||
//! See: [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec)
|
||||
//!
|
||||
/// An efficient neural codec for compressing/decompressing audio
|
||||
///
|
||||
use crate::models::encodec;
|
||||
use candle::{IndexOp, Result, Tensor, D};
|
||||
use candle_nn::{Conv1d, Conv1dConfig, ConvTranspose1d, ConvTranspose1dConfig, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! Implementation of the Depth Anything model from FAIR.
|
||||
//!
|
||||
//! See:
|
||||
//! - ["Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data"](https://github.com/LiheYoung/Depth-Anything)
|
||||
//!
|
||||
|
||||
use candle::D::Minus1;
|
||||
use candle::{Module, Result, Tensor};
|
||||
use candle_nn::ops::Identity;
|
||||
|
|
|
@ -1,3 +1,8 @@
|
|||
//! Implementation of the DINOv2 models from Meta Research.
|
||||
//!
|
||||
//! See:
|
||||
//! - DINOv2: ["DINOv2: Learning Robust Visual Features without Supervision"](https://github.com/facebookresearch/dinov2)
|
||||
//!
|
||||
use candle::{IndexOp, Result, Tensor, D};
|
||||
use candle_nn::{layer_norm, LayerNorm, Linear, Module, VarBuilder};
|
||||
|
||||
|
|
|
@ -1,3 +1,10 @@
|
|||
//! Implementation of the DINOv2 revision (4 regularization)
|
||||
//!
|
||||
//! See:
|
||||
//! - DINOv2: ["DINOv2: Learning Robust Visual Features without Supervision"](https://github.com/facebookresearch/dinov2)
|
||||
//!
|
||||
//! This code implements the regularization tokens version with 4 regularization tokens.
|
||||
//!
|
||||
use candle::{IndexOp, Result, Tensor, D};
|
||||
use candle_nn::{layer_norm, LayerNorm, Linear, Module, VarBuilder};
|
||||
|
||||
|
|
|
@ -1,3 +1,8 @@
|
|||
//! Implementation of DistilBert, a distilled version of BERT.
|
||||
//!
|
||||
//! See:
|
||||
//! - ["DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter"](https://arxiv.org/abs/1910.01108)
|
||||
//!
|
||||
use super::with_tracing::{layer_norm, linear, LayerNorm, Linear};
|
||||
use candle::{DType, Device, Result, Tensor};
|
||||
use candle_nn::{Embedding, Module, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,8 @@
|
|||
//! Implementation of EfficientBert, an efficient variant of BERT for computer vision tasks.
|
||||
//!
|
||||
//! See:
|
||||
//! - ["EfficientBERT: Progressively Searching Multilayer Perceptron Architectures for BERT"](https://arxiv.org/abs/2201.00462)
|
||||
//!
|
||||
use candle::{Result, Tensor, D};
|
||||
use candle_nn as nn;
|
||||
use nn::{Module, VarBuilder};
|
||||
|
|
|
@ -1,9 +1,8 @@
|
|||
//! EfficientViT (MSRA) inference implementation based on timm.
|
||||
//!
|
||||
//! See "EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention"
|
||||
//! https://arxiv.org/abs/2305.07027
|
||||
|
||||
//! https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/efficientvit_msra.py
|
||||
//! See ["EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention"](https://arxiv.org/abs/2305.07027)
|
||||
//!
|
||||
//! Based on implementation from [pytorch-image-models](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/efficientvit_msra.py)
|
||||
|
||||
use candle::{Result, Tensor, D};
|
||||
use candle_nn::{
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! EnCodec neural audio codec based on the Encodec implementation.
|
||||
//!
|
||||
//! See ["High Fidelity Neural Audio Compression"](https://arxiv.org/abs/2210.13438)
|
||||
//!
|
||||
//! Based on implementation from [huggingface/transformers](https://github.com/huggingface/transformers/blob/main/src/transformers/models/encodec/modeling_encodec.py)
|
||||
|
||||
#![allow(unused)]
|
||||
use candle::{DType, IndexOp, Layout, Module, Result, Shape, Tensor, D};
|
||||
use candle_nn::{conv1d, Conv1d, Conv1dConfig, ConvTranspose1d, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! EVA-2 inference implementation.
|
||||
//!
|
||||
//! See ["EVA-02: A Visual Representation for Neon Genesis"](https://arxiv.org/abs/2303.11331)
|
||||
//!
|
||||
//! Based on implementation from [pytorch-image-models](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/eva2.py)
|
||||
|
||||
use candle::{IndexOp, Result, Tensor, D};
|
||||
use candle_nn::{layer_norm, LayerNorm, Linear, Module, VarBuilder};
|
||||
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! Falcon language model inference implementation
|
||||
//!
|
||||
//! See ["Falcon: a new approach to large language models"](https://huggingface.co/blog/falcon)
|
||||
//!
|
||||
//! Based on implementation from [Huggingface Transformers](https://github.com/huggingface/transformers/blob/main/src/transformers/models/falcon)
|
||||
|
||||
use candle::{DType, Device, Result, Tensor, D};
|
||||
use candle_nn::{embedding, linear_b as linear, Embedding, LayerNorm, Linear, Module, VarBuilder};
|
||||
use serde::Deserialize;
|
||||
|
|
|
@ -1,9 +1,9 @@
|
|||
//! FastViT inference implementation based on timm
|
||||
//! # FastViT inference implementation based on timm
|
||||
//!
|
||||
//! See "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization"
|
||||
//! https://arxiv.org/pdf/2303.14189
|
||||
//! ## Description
|
||||
//! See ["FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization"](https://arxiv.org/pdf/2303.14189)
|
||||
//!
|
||||
//! https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/fastvit.py
|
||||
//! Implementation based on [timm model](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/fastvit.py)
|
||||
|
||||
use candle::{DType, Result, Tensor, D};
|
||||
use candle_nn::{
|
||||
|
|
|
@ -1,3 +1,10 @@
|
|||
//! Flux Model
|
||||
//!
|
||||
//! Flux is a series of text-to-image generation models based on diffusion transformers.
|
||||
//!
|
||||
//! - [GH Link](https://github.com/black-forest-labs/flux)
|
||||
//! - Transformers Python [reference implementation](https://github.com/huggingface/transformers/blob/5af7d41e49bbfc8319f462eb45253dcb3863dfb7/src/transformers/models/chinese_clip/modeling_chinese_clip.py)
|
||||
//!
|
||||
use candle::{Result, Tensor};
|
||||
|
||||
pub trait WithForward {
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! Gemma inference implementation.
|
||||
//!
|
||||
//! See ["Gemma: Open Models Based on Gemini Technology"](https://blog.google/technology/developers/gemma-open-ai-model/)
|
||||
//!
|
||||
//! Based on implementation from Google and PyTorch
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use candle::{DType, Device, Module, Result, Tensor, D};
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! Gemma LLM architecture (Google) inference implementation.
|
||||
//!
|
||||
//! See ["Gemma: Open Models Based on Gemini Technology"](https://blog.google/technology/developers/gemma-open-models/)
|
||||
//!
|
||||
//! Based on implementations from Google and OpenLLM
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use candle::{DType, Device, Module, Result, Tensor, D};
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! GLM-4 inference implementation.
|
||||
//!
|
||||
//! An open bilingual language model with 130B parameters.
|
||||
//!
|
||||
//! Based on implementation from [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B)
|
||||
|
||||
use crate::models::with_tracing::{linear_b as linear, Linear};
|
||||
use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
|
||||
use candle_nn::VarBuilder;
|
||||
|
|
|
@ -1,3 +1,10 @@
|
|||
//! Granite is a Long Context Transformer Language Model.
|
||||
//!
|
||||
//! A high performance transformer model optimized for efficient processing
|
||||
//! of very long context sequences
|
||||
//!
|
||||
//! Based on implementation from [Nod.ai](https://github.com/nod-ai/granite)
|
||||
|
||||
use super::with_tracing::{linear_no_bias as linear, Linear, RmsNorm};
|
||||
use candle::{DType, Device, IndexOp, Result, Tensor, D};
|
||||
use candle_nn::{embedding, Embedding, Module, VarBuilder};
|
||||
|
|
|
@ -1,9 +1,9 @@
|
|||
//! Hiera inference implementation based on timm.
|
||||
//! [Hiera] inference implementation based on timm.
|
||||
//!
|
||||
//! See "Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles"
|
||||
//! https://arxiv.org/abs/2306.00989
|
||||
//! See "[Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles]"
|
||||
//! [Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles]: https://arxiv.org/abs/2306.00989
|
||||
//!
|
||||
//! https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/hiera.py
|
||||
//! [Hiera]: https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/hiera.py
|
||||
|
||||
use candle::{Result, D};
|
||||
use candle_nn::{conv2d, layer_norm, linear, ops::softmax, Conv2dConfig, Func, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! # JinaBERT inference implementation
|
||||
//!
|
||||
//! Based on implementation from huggingface for Jina BERT and its variants
|
||||
//!
|
||||
//! See: [Jina Embeddings on HuggingFace](https://huggingface.co/jinaai/jina-embeddings-v2-base-en)
|
||||
|
||||
use super::with_tracing::{linear, linear_no_bias, Embedding, Linear};
|
||||
use candle::{DType, Device, IndexOp, Result, Tensor, D};
|
||||
use candle_nn::{layer_norm, LayerNorm, Module, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! Llama inference implementation.
|
||||
//!
|
||||
//! See ["LLaMA: Open and Efficient Foundation Language Models"](https://arxiv.org/abs/2302.13971)
|
||||
//!
|
||||
//! Implementation based on Hugging Face's [transformers](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py)
|
||||
|
||||
use super::with_tracing::{linear_no_bias as linear, Linear, RmsNorm};
|
||||
use candle::{DType, Device, IndexOp, Result, Tensor, D};
|
||||
use candle_nn::{embedding, Embedding, Module, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! Llama2 inference implementation.
|
||||
//!
|
||||
//! See ["LLaMA 2: Open Foundation and Fine-Tuned Chat Models"](https://arxiv.org/abs/2307.09288)
|
||||
//!
|
||||
//! Based on the [llama2.c](https://github.com/karpathy/llama2.c) implementation
|
||||
|
||||
use candle::{DType, Device, IndexOp, Result, Tensor, D};
|
||||
use candle_nn::linear_no_bias as linear;
|
||||
use candle_nn::{embedding, rms_norm, Embedding, Linear, Module, RmsNorm, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! Llama2 inference implementation.
|
||||
//!
|
||||
//! See ["LLaMA 2: Open Foundation and Fine-Tuned Chat Models"](https://arxiv.org/abs/2307.09288)
|
||||
//!
|
||||
//! Based on the [llama2.c](https://github.com/karpathy/llama2.c) implementation
|
||||
|
||||
use byteorder::{LittleEndian, ReadBytesExt};
|
||||
use candle::{DType, Device, IndexOp, Result, Shape, Tensor};
|
||||
use candle_nn::VarBuilder;
|
||||
|
|
|
@ -1,3 +1,13 @@
|
|||
//! The LLaVA (Large Language and Vision Assistant) model.
|
||||
//!
|
||||
//! This provides the main model implementation combining a vision tower (CLIP) with
|
||||
//! language model (Llama) for multimodal capabilities.
|
||||
//!
|
||||
//! The architecture implements the training-free projection technique from the paper:
|
||||
//! [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485).
|
||||
//!
|
||||
//! - [GH Link](https://github.com/haotian-liu/LLaVA/tree/main)
|
||||
//!
|
||||
pub mod config;
|
||||
pub mod utils;
|
||||
|
||||
|
|
|
@ -1,5 +1,10 @@
|
|||
/// A fast implementation of mamba for inference only.
|
||||
/// This is based on: https://github.com/LaurentMazare/mamba.rs
|
||||
//! Mamba inference implementation.
|
||||
//!
|
||||
//! See ["Mamba: Linear-Time Sequence Modeling with Selective State Spaces"](https://arxiv.org/abs/2312.00752)
|
||||
//!
|
||||
//! Based on reference implementation from the AlbertMamba project
|
||||
//! A fast implementation of mamba for inference only.
|
||||
//! Based on Laurent Mazare's rust implementation: [mamba.rs](https://github.com/LaurentMazare/mamba.rs)
|
||||
use crate::models::with_tracing::{linear, linear_no_bias, Linear};
|
||||
use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
|
||||
use candle_nn::{RmsNorm, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! Marian Neural Machine Translation
|
||||
//!
|
||||
//! See "Marian: Fast Neural Machine Translation in C++" Junczys-Dowmunt et al. 2018
|
||||
//! - [ACL Anthology](https://aclanthology.org/P18-4020/)
|
||||
//! - [Github](https://github.com/marian-nmt/marian)
|
||||
//!
|
||||
use super::with_tracing::{linear, Embedding, Linear};
|
||||
use candle::{Result, Tensor};
|
||||
use candle_nn::{layer_norm, LayerNorm, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
//! MetaVoice Studio ML Models
|
||||
//!
|
||||
//! See MetaVoice's TTS and voice cloning models:
|
||||
//! - [Github](https://github.com/metavoiceio/metavoice-src)
|
||||
//! - [Website](https://studio.metavoice.ai/)
|
||||
|
||||
use candle::{DType, Device, Error as E, IndexOp, Module, Result, Tensor, D};
|
||||
use candle_nn::{embedding, linear_b, rms_norm, Embedding, Linear, RmsNorm, VarBuilder};
|
||||
|
||||
|
|
|
@ -1,9 +1,14 @@
|
|||
// Adapted from the reference implementation at:
|
||||
// https://github.com/kyutai-labs/moshi
|
||||
//! mimi model
|
||||
//!
|
||||
//! Mimi is a state-of-the-art audio neural codec.
|
||||
//!
|
||||
//! - [HuggingFace Model Card](https://huggingface.co/kyutai/mimi)
|
||||
//! - [GitHub](https://github.com/kyutai-labs/moshi)
|
||||
//!
|
||||
|
||||
// Copyright (c) Kyutai, all rights reserved.
|
||||
// This source code is licensed under the license found in the
|
||||
// LICENSE file in the root directory of this source tree.
|
||||
|
||||
pub use candle;
|
||||
pub use candle_nn;
|
||||
|
||||
|
|
|
@ -1,3 +1,10 @@
|
|||
//! Mixtral Model, based on the Mistral architecture
|
||||
//!
|
||||
//! See Mistral and Mixtral at:
|
||||
//! - [Hugging Face](https://huggingface.co/docs/transformers/model_doc/mixtral)
|
||||
//! - [Github](https://github.com/mistralai/mistral-src)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::{linear_no_bias, Linear, RmsNorm};
|
||||
/// Mistral LLM, https://github.com/mistralai/mistral-src
|
||||
use candle::{DType, Device, Module, Result, Tensor, D};
|
||||
|
|
|
@ -1,3 +1,10 @@
|
|||
//! MixFormer (Microsoft's Phi Architecture)
|
||||
//!
|
||||
//! See "Textbooks Are All You Need II: phi-1.5 technical report", Lin et al. 2023
|
||||
//! - [Arxiv](https://arxiv.org/abs/2309.05463)
|
||||
//! - [Github](https://huggingface.co/microsoft/phi-1_5)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::{linear, Embedding as E, Linear};
|
||||
/// MixFormer model.
|
||||
/// https://huggingface.co/microsoft/phi-1_5
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! Mixtral Model, a sparse mixture of expert model based on the Mistral architecture
|
||||
//!
|
||||
//! See Mixtral model details at:
|
||||
//! - [Hugging Face](https://huggingface.co/docs/transformers/model_doc/mixtral)
|
||||
//! - [Mixtral-8x7B Blog Post](https://mistral.ai/news/mixtral-of-experts/)
|
||||
//!
|
||||
//! The model uses a mixture of experts architecture with:
|
||||
//! - 8 experts per layer
|
||||
//! - Top 2 expert routing
|
||||
//! - Sliding window attention
|
||||
//! - RoPE embeddings
|
||||
//!
|
||||
//! References:
|
||||
//! - [Hugging Face Implementation](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mixtral/modeling_mixtral.py)
|
||||
//! - [Mixtral Blog Post](https://mistral.ai/news/mixtral-of-experts/)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::{linear_no_bias, Linear, RmsNorm};
|
||||
/// Mixtral Model
|
||||
/// https://github.com/huggingface/transformers/blob/main/src/transformers/models/mixtral/modeling_mixtral.py
|
||||
|
|
|
@ -1,3 +1,12 @@
|
|||
//! Mix of Multi-scale Dilated and Traditional Convolutions
|
||||
//!
|
||||
//! Mix of Multi-scale Dilated and Traditional Convolutions (MMDiT) is an architecture
|
||||
//! introduced for Stable Diffusion 3, with the MMDiT-X variant used in Stable Diffusion 3.5.
|
||||
//!
|
||||
//! - [Research Paper](https://arxiv.org/abs/2403.03206)
|
||||
//! - ComfyUI [reference implementation](https://github.com/comfyanonymous/ComfyUI/blob/78e133d0415784924cd2674e2ee48f3eeca8a2aa/comfy/ldm/modules/diffusionmodules/mmdit.py)
|
||||
//! - Stability-AI [MMDiT-X implementation](https://github.com/Stability-AI/sd3.5/blob/4e484e05308d83fb77ae6f680028e6c313f9da54/mmditx.py)
|
||||
|
||||
pub mod blocks;
|
||||
pub mod embedding;
|
||||
pub mod model;
|
||||
|
|
|
@ -1,3 +1,19 @@
|
|||
//! Mobile CLIP model, combining a lightweight vision encoder with a text encoder
|
||||
//!
|
||||
//! A mobile-optimized CLIP implementation that uses:
|
||||
//! - FastViT as the vision encoder
|
||||
//! - OpenCLIP text encoder
|
||||
//! - Projection layers to align the feature spaces
|
||||
//!
|
||||
//! See model details at:
|
||||
//! - [FastViT](https://arxiv.org/abs/2303.14189)
|
||||
//! - [OpenCLIP](https://github.com/mlfoundations/open_clip)
|
||||
//!
|
||||
//! References:
|
||||
//! - [MobileVLM](https://huggingface.co/mobileVLM)
|
||||
//! - [MetaCLIP](https://arxiv.org/abs/2309.16671)
|
||||
//!
|
||||
|
||||
use super::fastvit;
|
||||
use super::openclip::text_model;
|
||||
use candle::{Result, Tensor, D};
|
||||
|
|
|
@ -1,9 +1,14 @@
|
|||
//! # MobileNet-v4
|
||||
//!
|
||||
//! MobileNet-v4 inference implementation based on timm.
|
||||
//!
|
||||
//! See "MobileNetV4 - Universal Models for the Mobile Ecosystem"
|
||||
//! https://arxiv.org/abs/2404.10518
|
||||
//! ## Paper
|
||||
//!
|
||||
//! https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/mobilenetv3.py
|
||||
//! ["MobileNetV4 - Universal Models for the Mobile Ecosystem"](https://arxiv.org/abs/2404.10518)
|
||||
//!
|
||||
//! ## References
|
||||
//!
|
||||
//! - [PyTorch Implementation](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/mobilenetv3.py)
|
||||
|
||||
use candle::{Result, Tensor, D};
|
||||
use candle_nn::{
|
||||
|
|
|
@ -1,7 +1,8 @@
|
|||
//! # MobileOne
|
||||
//!
|
||||
//! MobileOne inference implementation based on timm and candle-repvgg
|
||||
//!
|
||||
//! See "MobileOne: An Improved One millisecond Mobile Backbone"
|
||||
//! https://arxiv.org/abs/2206.04040
|
||||
//! See ["MobileOne: An Improved One millisecond Mobile Backbone"](https://arxiv.org/abs/2206.04040)
|
||||
|
||||
use candle::{DType, Result, Tensor, D};
|
||||
use candle_nn::{
|
||||
|
|
|
@ -1,3 +1,14 @@
|
|||
//! MoonDream Model vision-to-text
|
||||
//!
|
||||
//! The model consists of:
|
||||
//! - Vision encoder using a ViT-style architecture
|
||||
//! - Text decoder based on Microsoft's Phi model
|
||||
//! - Vision projection module to align vision and text embeddings
|
||||
//!
|
||||
//! References:
|
||||
//! - [MoonDream Original Implementation](https://github.com/vikhyat/moondream)
|
||||
//!
|
||||
|
||||
use crate::models::mixformer::{Config as PhiConfig, MixFormerSequentialForCausalLM as PhiModel};
|
||||
use crate::models::with_tracing::{layer_norm, linear_b, LayerNorm, Linear};
|
||||
use candle::{IndexOp, Module, Result, Tensor, D};
|
||||
|
|
|
@ -1,3 +1,11 @@
|
|||
//! Module implementing the MPT (Multi-Purpose Transformer) model
|
||||
//!
|
||||
//! References:
|
||||
//! - [MPT Model used by replit-code-v1_5-3b](https://huggingface.co/replit/replit-code-v1_5-3b/blob/main/modeling_mpt.py)
|
||||
//! - [Configuration](https://huggingface.co/replit/replit-code-v1_5-3b/blob/main/configuration_mpt.py)
|
||||
//!
|
||||
//! The model uses grouped query attention and alibi positional embeddings.
|
||||
|
||||
use crate::models::with_tracing::{linear_no_bias, Embedding, Linear};
|
||||
/// MPT model used by replit-code-v1_5-3b
|
||||
/// https://huggingface.co/replit/replit-code-v1_5-3b/blob/main/modeling_mpt.py
|
||||
|
|
|
@ -1,3 +1,19 @@
|
|||
//! OLMo (Open Language Model) implementation
|
||||
//!
|
||||
//! See OLMo model details at:
|
||||
//! - [Hugging Face](https://huggingface.co/allenai/OLMo)
|
||||
//! - [OLMo Paper](https://allenai.org/olmo)
|
||||
//!
|
||||
//! The model uses:
|
||||
//! - RoPE embeddings
|
||||
//! - Sliding window attention
|
||||
//! - Transformer architecture
|
||||
//!
|
||||
//! References:
|
||||
//! - [Hugging Face Implementation](https://huggingface.co/allenai/OLMo)
|
||||
//! - [OLMo Paper](https://allenai.org/olmo)
|
||||
//!
|
||||
|
||||
use candle::{DType, Device, Module, Result, Tensor, D};
|
||||
use candle_nn::{linear_b, linear_no_bias, Activation, LayerNorm, Linear, VarBuilder};
|
||||
use std::sync::Arc;
|
||||
|
|
|
@ -1 +1,9 @@
|
|||
//! Open Contrastive Language-Image Pre-Training
|
||||
//!
|
||||
//! Open Contrastive Language-Image Pre-Training (OpenCLIP) is an architecture trained on
|
||||
//! pairs of images with related texts.
|
||||
//!
|
||||
//! - [GH Link](https://github.com/mlfoundations/open_clip)
|
||||
//!
|
||||
|
||||
pub mod text_model;
|
||||
|
|
|
@ -1,3 +1,19 @@
|
|||
//! Multimodal multi-purpose model combining Gemma-based language model with SigLIP image understanding
|
||||
//!
|
||||
//! See PaLiGemma details at:
|
||||
//! - [Paper](https://arxiv.org/abs/2402.05257)
|
||||
//! - [Google Blog Post](https://blog.research.google/2024/02/paligemma-scaling-language-image.html)
|
||||
//!
|
||||
//! The model is a multimodal combination of:
|
||||
//! - SigLIP vision encoder
|
||||
//! - Gemma language model
|
||||
//! - Cross-projection layers
|
||||
//!
|
||||
//! References:
|
||||
//! - [HuggingFace Implementation](https://huggingface.co/google/paligemma-3b)
|
||||
//! - [Paper: PaLI-3 and Beyond: Scaling Language-Image Learning](https://arxiv.org/abs/2402.05257)
|
||||
//!
|
||||
|
||||
use crate::models::{gemma, siglip};
|
||||
use candle::{Module, Result, Tensor};
|
||||
use candle_nn::{linear, Linear, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! Parler Model implementation for parler_tts text-to-speech synthesis
|
||||
//!
|
||||
//! Implements a transformer-based decoder architecture for generating audio tokens
|
||||
//! from text using discrete tokens. The model converts text into audio segments
|
||||
//! using multiple codebooks of quantized audio tokens.
|
||||
//!
|
||||
//! The model architecture includes:
|
||||
//! - Multi-head attention layers for text and audio processing
|
||||
//! - Feed-forward networks
|
||||
//! - Layer normalization
|
||||
//! - Positional embeddings
|
||||
//! - Multiple codebook prediction heads
|
||||
//!
|
||||
//! The implementation follows the original parler_tts architecture while focusing
|
||||
//! on audio token generation for text-to-speech synthesis.
|
||||
//!
|
||||
|
||||
use crate::generation::LogitsProcessor;
|
||||
use crate::models::t5;
|
||||
use candle::{IndexOp, Result, Tensor};
|
||||
|
|
|
@ -1,3 +1,19 @@
|
|||
//! Persimmon Model
|
||||
//!
|
||||
//! A transformer language model for efficient inference and general-purpose tasks. See Persimmon model details at:
|
||||
//! - [Hugging Face](https://huggingface.co/adept/persimmon-8b-base)
|
||||
//!
|
||||
//! The model uses a standard transformer architecture with:
|
||||
//! - Layer normalization for Q/K attention
|
||||
//! - RoPE embeddings with partial rotary factor
|
||||
//! - ReLU activation
|
||||
//! - Separate number of attention heads and KV heads
|
||||
//!
|
||||
//! References:
|
||||
//! - [Hugging Face Implementation](https://github.com/huggingface/transformers/blob/main/src/transformers/models/persimmon/modeling_persimmon.py)
|
||||
//! - [Persimmon Config](https://github.com/huggingface/transformers/blob/main/src/transformers/models/persimmon/configuration_persimmon.py)
|
||||
//!
|
||||
|
||||
use candle::DType;
|
||||
use serde::Deserialize;
|
||||
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! Microsoft Phi model implementation
|
||||
//!
|
||||
//! See Phi model details at:
|
||||
//! - [Phi-2 Model](https://huggingface.co/microsoft/phi-2)
|
||||
//!
|
||||
//! The Phi series are decoder-only transformers designed for code and language tasks.
|
||||
//! Key characteristics:
|
||||
//! - Decoder-only transformer architecture
|
||||
//! - RoPE embeddings
|
||||
//! - Layer normalization
|
||||
//! - QK normalization
|
||||
//!
|
||||
//! References:
|
||||
//! - [Hugging Face Implementation](https://huggingface.co/microsoft/phi-2)
|
||||
//! - [Alternative Implementation](https://huggingface.co/microsoft/phi-2/tree/main)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::{layer_norm, linear, Embedding, LayerNorm, Linear};
|
||||
/// Phi model.
|
||||
/// https://huggingface.co/microsoft/phi-2
|
||||
|
|
|
@ -1,3 +1,22 @@
|
|||
//! Microsoft Phi-3 model implementation
|
||||
//!
|
||||
//! See Phi model details at:
|
||||
//! - [Phi-3 Model](https://huggingface.co/microsoft/phi-3)
|
||||
//!
|
||||
//! The Phi series are decoder-only transformers designed for code and language tasks.
|
||||
//! Key characteristics:
|
||||
//! - Decoder-only transformer architecture
|
||||
//! - RoPE embeddings
|
||||
//! - Layer normalization
|
||||
//! - QK normalization
|
||||
//! - Mixed activation functions
|
||||
//! - Improved context window handling
|
||||
//!
|
||||
//! References:
|
||||
//! - [Hugging Face Implementation](https://huggingface.co/microsoft/phi-3)
|
||||
//! - [Alternative Implementation](https://huggingface.co/microsoft/phi-3/tree/main)
|
||||
//!
|
||||
|
||||
// This implementation is based on:
|
||||
// https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/modeling_phi3.py
|
||||
use crate::models::with_tracing::{linear_no_bias as linear, Linear, RmsNorm};
|
||||
|
|
|
@ -1,3 +1,11 @@
|
|||
//! Pixtral Language-Image Pre-Training
|
||||
//!
|
||||
//! Pixtral is an architecture trained for multimodal learning
|
||||
//! using images paired with text descriptions.
|
||||
//!
|
||||
//! - Transformers Python [reference implementation](https://github.com/huggingface/transformers/tree/main/src/transformers/models/pixtral)
|
||||
//!
|
||||
|
||||
pub mod llava;
|
||||
pub mod vision_model;
|
||||
|
||||
|
|
|
@ -1,3 +1,19 @@
|
|||
//! BLIP model implementation with quantization support.
|
||||
//!
|
||||
//! BLIP is a vision-language model for image understanding and generation tasks.
|
||||
//! This implementation provides quantization for reduced memory and compute.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Vision encoder using ViT architecture
|
||||
//! - Text decoder using BERT-style transformer
|
||||
//! - Cross-attention between vision and text features
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [BLIP Paper](https://arxiv.org/abs/2201.12086)
|
||||
//! - [Hugging Face Implementation](https://huggingface.co/docs/transformers/model_doc/blip)
|
||||
//!
|
||||
|
||||
use super::quantized_blip_text as blip_text;
|
||||
use crate::quantized_nn::{layer_norm, linear, Linear};
|
||||
pub use crate::quantized_var_builder::VarBuilder;
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! Quantized BLIP text module implementation.
|
||||
//!
|
||||
//! Provides the text decoder portion of the BLIP model with 8-bit quantization.
|
||||
//! Uses a BERT-style transformer architecture for text processing.
|
||||
//!
|
||||
//! Key components:
|
||||
//! - Text embeddings layer with position embeddings
|
||||
//! - Multi-head self attention layers
|
||||
//! - Cross-attention for vision-text fusion
|
||||
//! - Layer normalization and feed-forward layers
|
||||
//! - Quantized linear transformations
|
||||
//!
|
||||
//! References:
|
||||
//! - [BLIP Paper](https://arxiv.org/abs/2201.12086)
|
||||
//! - [Hugging Face Implementation](https://huggingface.co/docs/transformers/model_doc/blip)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::QMatMul;
|
||||
use crate::quantized_nn::{layer_norm, linear, Embedding, Linear};
|
||||
pub use crate::quantized_var_builder::VarBuilder;
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! Quantized llama model implementation.
|
||||
//!
|
||||
//! This provides a quantized implementation of the llama language model architecture.
|
||||
//! The model implements parameter efficient quantization for reduced memory usage
|
||||
//! while maintaining model quality.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Transformer decoder architecture
|
||||
//! - Support for 2/3/4/8-bit quantization
|
||||
//! - Optimized memory usage through quantization
|
||||
//! - Configurable model sizes and parameter counts
|
||||
//!
|
||||
//! References:
|
||||
//! - [LLaMA Paper](https://arxiv.org/abs/2302.13971)
|
||||
//! - [LLaMA Model](https://github.com/facebookresearch/llama)
|
||||
//!
|
||||
|
||||
use std::collections::HashMap;
|
||||
|
||||
use crate::quantized_nn::RmsNorm;
|
||||
|
|
|
@ -1,3 +1,19 @@
|
|||
//! Quantized Llama2 model implementation.
|
||||
//!
|
||||
//! This provides an 8-bit quantized implementation of Meta's LLaMA2 language model
|
||||
//! for reduced memory usage and faster inference.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Decoder-only transformer architecture
|
||||
//! - RoPE position embeddings
|
||||
//! - Grouped Query Attention
|
||||
//! - 8-bit quantization of weights
|
||||
//!
|
||||
//! References:
|
||||
//! - [LLaMA2 Paper](https://arxiv.org/abs/2307.09288)
|
||||
//! - [LLaMA2 Technical Report](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)
|
||||
//!
|
||||
|
||||
use super::llama2_c::{Cache, Config};
|
||||
use crate::quantized_nn::{linear_no_bias as linear, Embedding, Linear, RmsNorm};
|
||||
pub use crate::quantized_var_builder::VarBuilder;
|
||||
|
|
|
@ -1,3 +1,19 @@
|
|||
//! Quantized MetaVoice model implementation.
|
||||
//!
|
||||
//! MetaVoice is a conditional text-to-speech model based on a transformer architecture.
|
||||
//! This implementation provides quantization for reduced memory and compute.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Transformer-based autoregressive decoder
|
||||
//! - Speaker conditioning
|
||||
//! - Support for 8-bit quantization
|
||||
//! - Key-value caching for efficient inference
|
||||
//! - RMS normalization layers
|
||||
//!
|
||||
//! References:
|
||||
//! - [MetaVoice Code](https://github.com/metavoiceio/metavoice)
|
||||
//!
|
||||
|
||||
use crate::quantized_nn::{linear_b, Embedding, Linear, RmsNorm};
|
||||
pub use crate::quantized_var_builder::VarBuilder;
|
||||
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! Mistral model implementation with quantization support.
|
||||
//!
|
||||
//! Mistral is a large language model optimized for efficiency.
|
||||
//! This implementation provides quantization for reduced memory and compute.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Sliding window attention mechanism
|
||||
//! - Grouped query attention (GQA)
|
||||
//! - RMSNorm for layer normalization
|
||||
//! - Rotary positional embeddings (RoPE)
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [Mistral Paper](https://arxiv.org/abs/2310.06825)
|
||||
//! - [Model Card](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
||||
//!
|
||||
|
||||
use crate::quantized_nn::{linear_no_bias, Embedding, Linear, RmsNorm};
|
||||
pub use crate::quantized_var_builder::VarBuilder;
|
||||
use candle::{DType, Device, Module, Result, Tensor, D};
|
||||
|
|
|
@ -1,3 +1,16 @@
|
|||
//! Module containing quantized MixFormer model implementation.
|
||||
//!
|
||||
//! MixFormer is an efficient transformer variant for text generation that uses
|
||||
//! mixture-of-experts and parallel attention/feed-forward blocks.
|
||||
//! This implementation provides quantization for reduced memory usage.
|
||||
//!
|
||||
//! Key features:
|
||||
//! - Parallel attention and feed-forward computation
|
||||
//! - Rotary positional embeddings
|
||||
//! - Optional key-value caching
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
|
||||
use crate::quantized_nn::{layer_norm, linear, Linear};
|
||||
pub use crate::quantized_var_builder::VarBuilder;
|
||||
use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
|
||||
|
|
|
@ -1,3 +1,18 @@
|
|||
//! Implementation of a quantized Moondream vision language model.
|
||||
//!
|
||||
//! Moondream is a lightweight vision-language model for image understanding and generation.
|
||||
//! This module provides a quantized version for reduced memory usage and faster inference.
|
||||
//!
|
||||
//! Key features:
|
||||
//! - ViT-based vision encoder
|
||||
//! - Phi-2 text decoder model
|
||||
//! - Memory efficient 8-bit quantization
|
||||
//! - Optimized for efficient deployment
|
||||
//!
|
||||
//! References:
|
||||
//! - [Moondream Model](https://github.com/vikhyat/moondream)
|
||||
//!
|
||||
|
||||
use crate::models::moondream::{Config, VisionConfig};
|
||||
use crate::models::quantized_mixformer::MixFormerSequentialForCausalLM as PhiModel;
|
||||
use crate::quantized_nn::{layer_norm, linear_b, Linear};
|
||||
|
|
|
@ -1,3 +1,21 @@
|
|||
//! Quantized MPT model implementation.
|
||||
//!
|
||||
//! MPT (MPT-7B) is a causal transformer model series optimized for code generation.
|
||||
//! This implementation provides quantization for reduced memory and compute.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Multi-Query Grouped Attention (MQA)
|
||||
//! - Support for KV-caching
|
||||
//! - Pre-computed ALiBi attention biases
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [Replit Code Models](https://huggingface.co/replit/replit-code-v1_5-3b)
|
||||
//! - [MPT-7B Implementation](https://github.com/mosaicml/llm-foundry)
|
||||
//!
|
||||
/// MPT model used by replit-code-v1_5-3b
|
||||
/// https://huggingface.co/replit/replit-code-v1_5-3b/blob/main/modeling_mpt.py
|
||||
///
|
||||
use crate::quantized_nn::{layer_norm_no_bias, linear_no_bias, Embedding, Linear};
|
||||
pub use crate::quantized_var_builder::VarBuilder;
|
||||
/// MPT model used by replit-code-v1_5-3b
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! Phi2 model implementation with quantization support.
|
||||
//!
|
||||
//! Phi2 is a 2.7B parameter language model using scaled-up Transformer decoder architecture.
|
||||
//! This implementation provides quantization for reduced memory and compute usage.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Partial attention with learned mixing to reduce quadratic costs
|
||||
//! - Layer reuse for improved inference efficiency
|
||||
//! - Linear transformations with scalar mixing
|
||||
//! - Rotary positional embeddings (RoPE)
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [Phi2 Paper](https://arxiv.org/abs/2309.05463)
|
||||
//! - [Model Card](https://huggingface.co/microsoft/phi-2)
|
||||
//!
|
||||
|
||||
use std::collections::HashMap;
|
||||
|
||||
use candle::quantized::gguf_file;
|
||||
|
|
|
@ -1,3 +1,18 @@
|
|||
//! Phi3 model implementation with quantization support.
|
||||
//!
|
||||
//! Phi3 is a language model intended for research purposes.
|
||||
//! This implementation provides quantization for reduced memory usage.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Multi-head attention
|
||||
//! - RMSNorm for layer normalization
|
||||
//! - Rotary positional embeddings (RoPE)
|
||||
//! - Support for quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [Model Card](https://huggingface.co/microsoft/phi-3)
|
||||
//!
|
||||
|
||||
use std::collections::HashMap;
|
||||
|
||||
use candle::quantized::gguf_file;
|
||||
|
|
|
@ -1,3 +1,18 @@
|
|||
//! Qwen2 model implementation with quantization support.
|
||||
//!
|
||||
//! Qwen2 is a chat-optimized language model that supports 8-bit quantization
|
||||
//! for reduced memory usage and faster inference.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Group Query Attention (GQA)
|
||||
//! - RMSNorm for layer normalization
|
||||
//! - Rotary positional embeddings (RoPE)
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [Model Card](https://huggingface.co/Qwen/Qwen2)
|
||||
//!
|
||||
|
||||
use crate::{quantized_nn::RmsNorm, utils::repeat_kv};
|
||||
use candle::{
|
||||
quantized::{gguf_file, QMatMul},
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! Recurrent Gemma model implementation with quantization support.
|
||||
//!
|
||||
//! Gemma is a large language model optimized for efficiency.
|
||||
//! This implementation provides quantization for reduced memory and compute.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Recurrent blocks with gated recurrent units
|
||||
//! - Convolution and attention blocks
|
||||
//! - RMSNorm for layer normalization
|
||||
//! - Rotary positional embeddings (RoPE)
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [Gemma Paper](https://arxiv.org/abs/2401.06751)
|
||||
//! - [Model Card](https://ai.google.dev/gemma)
|
||||
//!
|
||||
|
||||
use crate::quantized_nn::{linear_b as linear, Embedding, Linear};
|
||||
pub use crate::quantized_var_builder::VarBuilder;
|
||||
use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! RWKV v5 model implementation with quantization support.
|
||||
//!
|
||||
//! RWKV v5 is an attention-free language model optimized for efficiency.
|
||||
//! This implementation provides quantization for reduced memory and compute.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Linear attention mechanism
|
||||
//! - GroupNorm layer normalization
|
||||
//! - Time-mixing layers
|
||||
//! - State-based sequential processing
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [RWKV Model](https://github.com/BlinkDL/RWKV-LM)
|
||||
//! - [RWKV v5 Architecture](https://www.rwkv.com/v5)
|
||||
//!
|
||||
|
||||
use crate::{
|
||||
quantized_nn::{layer_norm, linear_no_bias as linear, Embedding, Linear},
|
||||
quantized_var_builder::VarBuilder,
|
||||
|
|
|
@ -1,3 +1,21 @@
|
|||
//! RWKV v6 model implementation with quantization support.
|
||||
//!
|
||||
//! RWKV is a linear attention model that combines the efficiency of RNNs
|
||||
//! with the parallelizable training of Transformers. Version 6 builds on previous
|
||||
//! versions with further optimizations.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Linear attention mechanism
|
||||
//! - Time mixing layers
|
||||
//! - Channel mixing layers
|
||||
//! - RMSNorm for normalization
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [RWKV Architecture](https://github.com/BlinkDL/RWKV-LM)
|
||||
//! - [RWKV v6 Release](https://huggingface.co/BlinkDL/rwkv-6)
|
||||
//!
|
||||
|
||||
use crate::{
|
||||
quantized_nn::{layer_norm, linear_no_bias as linear, Embedding, Linear},
|
||||
quantized_var_builder::VarBuilder,
|
||||
|
|
|
@ -1,3 +1,18 @@
|
|||
//! Module for quantized StableLM implementation.
|
||||
//!
|
||||
//! StableLM is a series of open-source large language models
|
||||
//! optimized for performance and stability. This implementation
|
||||
//! provides quantization support for efficient model deployment.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - RMSNorm for layer normalization
|
||||
//! - Rotary positional embeddings (RoPE)
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [StableLM](https://github.com/Stability-AI/StableLM)
|
||||
//!
|
||||
|
||||
use crate::quantized_nn::{layer_norm, linear, linear_no_bias, Embedding, Linear};
|
||||
pub use crate::quantized_var_builder::VarBuilder;
|
||||
use candle::{DType, Device, Module, Result, Tensor, D};
|
||||
|
|
|
@ -1,5 +1,19 @@
|
|||
// T5 Text Model, quantized version
|
||||
// https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py
|
||||
//! T5 model implementation with quantization support.
|
||||
//!
|
||||
//! T5 is an encoder-decoder model pre-trained on a multi-task mixture of supervised
|
||||
//! and unsupervised tasks. This implementation provides quantization for reduced
|
||||
//! memory and compute requirements.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Encoder-decoder architecture
|
||||
//! - Layer normalization
|
||||
//! - Relative positional encodings
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [T5 Paper](https://arxiv.org/abs/1910.10683)
|
||||
//! - [Model Card](https://huggingface.co/t5-base)
|
||||
//! - Original model from [T5](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py)
|
||||
|
||||
use crate::models::t5::{deserialize_feed_forward_proj_activation, ActivationWithOptionalGating};
|
||||
use crate::models::with_tracing::QMatMul;
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! Qwen2 model implementation with quantization support.
|
||||
//!
|
||||
//! Qwen2 is a large language model from Alibaba optimized for efficiency.
|
||||
//! This implementation provides quantization for reduced memory and compute.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Streaming decode support
|
||||
//! - Grouped query attention (GQA)
|
||||
//! - RMSNorm for layer normalization
|
||||
//! - Rotary positional embeddings (RoPE)
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [Qwen2 Model](https://huggingface.co/Qwen/Qwen2-7B)
|
||||
//! - [Model Card](https://huggingface.co/Qwen/Qwen2-7B)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::{linear, linear_no_bias, Linear, RmsNorm};
|
||||
use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
|
||||
use candle_nn::{Activation, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,21 @@
|
|||
//! Qwen2 model implementation with Mixture of Experts support.
|
||||
//!
|
||||
//! Qwen2 is a large language model using sparse Mixture of Experts (MoE).
|
||||
//! This implementation provides support for sparsely activated MoE layers.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Mixture of Experts architecture
|
||||
//! - Sparse expert activation
|
||||
//! - Shared expert routing mechanism
|
||||
//! - Grouped query attention (GQA)
|
||||
//! - RMSNorm for layer normalization
|
||||
//! - Rotary positional embeddings (RoPE)
|
||||
//!
|
||||
//! References:
|
||||
//! - [Qwen2 Paper](https://arxiv.org/abs/2401.08985)
|
||||
//! - [Model Card](https://huggingface.co/Qwen/Qwen2-7B-beta)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::{linear, linear_no_bias, Linear, RmsNorm};
|
||||
use candle::{DType, Device, Module, Result, Tensor, D};
|
||||
use candle_nn::{Activation, VarBuilder};
|
||||
|
|
|
@ -1,5 +1,22 @@
|
|||
// This implementation is based on the python version from huggingface/transformers.
|
||||
// https://github.com/huggingface/transformers/blob/b109257f4fb8b1166e7c53cc5418632014ed53a5/src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py#L2
|
||||
//! Recurrent Gemma model implementation
|
||||
//!
|
||||
//! Recurrent Gemma is a version of the Gemma language model that incorporates recurrent memory.
|
||||
//! This allows the model to maintain state between predictions and have longer-range memory.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Real-gated linear recurrent units (RGLRU)
|
||||
//! - 1D convolution for local context
|
||||
//! - RMSNorm for layer normalization
|
||||
//! - Rotary positional embeddings (RoPE)
|
||||
//! - Grouped query attention
|
||||
//!
|
||||
//! References:
|
||||
//! - [Gemma: Open Models Based on Gemini Technology](https://blog.google/technology/developers/gemma-open-models/)
|
||||
//! - [Recurrent Memory model architecture](https://arxiv.org/abs/2402.00441)
|
||||
//!
|
||||
//! This implementation is based on the python version from huggingface/transformers.
|
||||
//! https://github.com/huggingface/transformers/blob/b109257f4fb8b1166e7c53cc5418632014ed53a5/src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py#L2
|
||||
//!
|
||||
use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
|
||||
use candle_nn::{linear_b as linear, Linear, VarBuilder};
|
||||
use std::sync::Arc;
|
||||
|
|
|
@ -2,6 +2,17 @@
|
|||
//!
|
||||
//! See "RepVGG: Making VGG-style ConvNets Great Again" Ding et al. 2021
|
||||
//! https://arxiv.org/abs/2101.03697
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Efficient inference architecture through structural reparameterization
|
||||
//! - Single 3x3 conv layer after fusing 3x3 branch, 1x1 branch and identity branch
|
||||
//! - Different configurations including a0-a2, b0-b3 and variants with group convolutions
|
||||
//! - High accuracy with VGG-like plain architecture and training
|
||||
//!
|
||||
//! References:
|
||||
//! - [RepVGG Paper](https://arxiv.org/abs/2101.03697)
|
||||
//! - [Official Implementation](https://github.com/DingXiaoH/RepVGG)
|
||||
//!
|
||||
|
||||
use candle::{Result, Tensor, D};
|
||||
use candle_nn::{
|
||||
|
|
|
@ -1,7 +1,15 @@
|
|||
//! ResNet implementation.
|
||||
//! # ResNet Implementation
|
||||
//!
|
||||
//! See "Deep Residual Learning for Image Recognition" He et al. 2015
|
||||
//! <https://arxiv.org/abs/1512.03385>
|
||||
//! Implementation of ResNet architectures as described in the paper:
|
||||
//!
|
||||
//! ## Reference
|
||||
//!
|
||||
//! [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
|
||||
//! He et al. (2015)
|
||||
//!
|
||||
//! This paper introduced ResNet, a deep neural network architecture that utilizes
|
||||
//! skip connections ("residual connections") to enable training of very deep networks.
|
||||
|
||||
use candle::{Result, D};
|
||||
use candle_nn::{batch_norm, Conv2d, Func, VarBuilder};
|
||||
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! RWKV v5 model implementation.
|
||||
//!
|
||||
//! RWKV is an RNN with transformer-level performance that can be implemented
|
||||
//! as either a transformer or RNN.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Time-mix attention mechanism
|
||||
//! - Channel-mix feed-forward network
|
||||
//! - Linear attention
|
||||
//! - Group normalization
|
||||
//! - Token shift mechanism
|
||||
//!
|
||||
//! References:
|
||||
//! - [RWKV Language Model](https://github.com/BlinkDL/RWKV-LM)
|
||||
//! - [RWKV v5 Release](https://github.com/BlinkDL/ChatRWKV/tree/main)
|
||||
//!
|
||||
|
||||
use super::with_tracing::{layer_norm, linear_no_bias as linear, LayerNorm, Linear};
|
||||
use candle::{DType, Device, IndexOp, Result, Tensor};
|
||||
use candle_nn::{embedding, Embedding, Module, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,19 @@
|
|||
//! RWKV v6 model implementation.
|
||||
//!
|
||||
//! RWKV is an RNN with transformer-like performance.
|
||||
//! Version 6 introduces refinements to the architecture.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Linear attention mechanism
|
||||
//! - Time-mixing for temporal dependencies
|
||||
//! - Group normalization
|
||||
//! - Feed forward gating
|
||||
//! - State recycling for efficient inference
|
||||
//!
|
||||
//! References:
|
||||
//! - [RWKV Model](https://github.com/BlinkDL/RWKV-LM)
|
||||
//!
|
||||
|
||||
use super::with_tracing::{layer_norm, linear_no_bias as linear, LayerNorm, Linear};
|
||||
use candle::{IndexOp, Result, Tensor};
|
||||
use candle_nn::{embedding, Embedding, Module, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,19 @@
|
|||
//! Segformer model implementation for semantic segmentation and image classification.
|
||||
//!
|
||||
//! Segformer is a transformer-based model designed for vision tasks. It uses a hierarchical
|
||||
//! structure that progressively generates features at different scales.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Efficient self-attention with sequence reduction
|
||||
//! - Hierarchical feature generation
|
||||
//! - Mix-FFN for local and global feature interaction
|
||||
//! - Lightweight all-MLP decode head
|
||||
//!
|
||||
//! References:
|
||||
//! - [SegFormer Paper](https://arxiv.org/abs/2105.15203)
|
||||
//! - [Model Card](https://huggingface.co/nvidia/mit-b0)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::{conv2d, linear, Conv2d, Linear};
|
||||
use candle::{Module, ModuleT, Result, Tensor, D};
|
||||
use candle_nn::{conv2d_no_bias, layer_norm, Activation, Conv2dConfig, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,11 @@
|
|||
//! Segment Anything Model (SAM)
|
||||
//!
|
||||
//! SAM is an architecture for image segmentation, capable of segmenting any object
|
||||
//! in an image based on prompts like points or boxes.
|
||||
//!
|
||||
//! - [GH Link](https://github.com/facebookresearch/segment-anything)
|
||||
//! - [Paper](https://arxiv.org/abs/2304.02643)
|
||||
//!
|
||||
pub use crate::models::with_tracing::Linear;
|
||||
use candle::{Result, Tensor};
|
||||
use candle_nn::{Module, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,11 @@
|
|||
//! Siglip model implementation.
|
||||
//!
|
||||
//! Siglip architecture combining vision and language for zero-shot tasks.
|
||||
//!
|
||||
//! References:
|
||||
//! - [Model Card](https://huggingface.co/google/siglip-base-patch16-224)
|
||||
//!
|
||||
|
||||
use crate::models::clip::div_l2_norm;
|
||||
use candle::{IndexOp, Module, Result, Tensor, D};
|
||||
use candle_nn::{layer_norm, linear, LayerNorm, Linear, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,12 @@
|
|||
//! Stable Diffusion
|
||||
//!
|
||||
//! Stable Diffusion is a latent text-to-image diffusion model capable of
|
||||
//! generating photo-realistic images given any text input.
|
||||
//!
|
||||
//! - [Original Repository](https://github.com/CompVis/stable-diffusion)
|
||||
//! - [Hugging Face](https://huggingface.co/runwayml/stable-diffusion-v1-5)
|
||||
//!
|
||||
|
||||
pub mod attention;
|
||||
pub mod clip;
|
||||
pub mod ddim;
|
||||
|
|
|
@ -1,3 +1,18 @@
|
|||
//! StableLM model implementation.
|
||||
//!
|
||||
//! StableLM is a family of language models trained by Stability AI.
|
||||
//! This implementation supports the StableLM architecture.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Grouped query attention (GQA)
|
||||
//! - Layer normalization
|
||||
//! - Rotary positional embeddings (RoPE)
|
||||
//! - Support for different model sizes (3B, 7B)
|
||||
//!
|
||||
//! References:
|
||||
//! - [Model Card](https://huggingface.co/stabilityai/stablelm-3b-4e1t)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::{linear, linear_no_bias, Linear};
|
||||
use candle::{DType, Device, Module, Result, Tensor, D};
|
||||
use candle_nn::{Activation, LayerNorm, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! StarCoder model implementation with quantization support.
|
||||
//!
|
||||
//! StarCoder is a large language model optimized for code generation.
|
||||
//! This implementation provides quantization for reduced memory and compute.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Causal self-attention mechanism
|
||||
//! - Multi-query attention (MQA)
|
||||
//! - LayerNorm for normalization
|
||||
//! - Absolute positional embeddings
|
||||
//! - Support for 8-bit quantization
|
||||
//!
|
||||
//! References:
|
||||
//! - [StarCoder Paper](https://arxiv.org/abs/2305.06161)
|
||||
//! - [Model Card](https://huggingface.co/bigcode/starcoder)
|
||||
//!
|
||||
|
||||
#![allow(unused)]
|
||||
use candle::{DType, Device, Module, Result, Tensor, D};
|
||||
use candle_nn::{layer_norm, linear_b, LayerNorm, Linear, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! Stella v5 model implementation.
|
||||
//!
|
||||
//! Stella is a dense text embedding model optimized for retrieval and similarity tasks.
|
||||
//! This implementation provides support for multiple embedding dimensions.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Dense text embeddings optimized for similarity search
|
||||
//! - Multiple output dimension support (256 to 8192)
|
||||
//! - Grouped query attention (GQA)
|
||||
//! - RMSNorm for layer normalization
|
||||
//! - Rotary positional embeddings (RoPE)
|
||||
//!
|
||||
//! References:
|
||||
//! - [MRL Framework](https://arxiv.org/abs/2205.13147)
|
||||
//! - [Model Card](https://huggingface.co/dunzhang/stella_en_1.5B_v5)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::{linear, linear_no_bias, Linear, RmsNorm};
|
||||
use candle::{DType, Device, IndexOp, Module, Result, Tensor};
|
||||
use candle_nn::{Activation, VarBuilder};
|
||||
|
|
|
@ -1,5 +1,19 @@
|
|||
// T5 Text Model
|
||||
// https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py
|
||||
//! T5 model implementation.
|
||||
//!
|
||||
//! T5 (Text-to-Text Transfer Transformer) is a unified text-to-text transformer model.
|
||||
//! This implementation follows the original model architecture.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Text-to-text framework
|
||||
//! - Relative positional embeddings
|
||||
//! - T5-specific layer normalization
|
||||
//! - Encoder-decoder architecture
|
||||
//! - Support for sequence-to-sequence tasks
|
||||
//!
|
||||
//! References:
|
||||
//! - [T5 Paper](https://arxiv.org/abs/1910.10683)
|
||||
//! - [HuggingFace T5](https://huggingface.co/docs/transformers/model_doc/t5)
|
||||
//! - [GH Model](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py)
|
||||
|
||||
use crate::models::with_tracing::Embedding;
|
||||
use candle::{DType, Device, Module, Result, Tensor, D};
|
||||
|
|
|
@ -1,3 +1,19 @@
|
|||
//! TrOCR model implementation.
|
||||
//!
|
||||
//! TrOCR is a Transformer-based OCR model that uses a Vision Transformer encoder
|
||||
//! and a BART-like decoder for optical character recognition.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Vision Transformer encoder for image processing
|
||||
//! - BART-style decoder for text generation
|
||||
//! - Learned positional embeddings
|
||||
//! - Layer normalization and self-attention
|
||||
//!
|
||||
//! References:
|
||||
//! - [Paper](https://arxiv.org/abs/2109.10282)
|
||||
//! - [Model Card](https://huggingface.co/microsoft/trocr-base-handwritten)
|
||||
//!
|
||||
|
||||
use crate::models::vit::{Config, Embeddings, Encoder};
|
||||
use candle::{DType, Result, Tensor};
|
||||
use candle_nn::{
|
||||
|
|
|
@ -1,7 +1,18 @@
|
|||
//! VGG-16 model implementation.
|
||||
//!
|
||||
//! See Very Deep Convolutional Networks for Large-Scale Image Recognition
|
||||
//! <https://arxiv.org/abs/1409.1556>
|
||||
//! VGG-16 is a convolutional neural network architecture. It consists of 13
|
||||
//! convolutional layers followed by 3 fully connected layers.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Conv layers with 3x3 filters
|
||||
//! - Max pooling after every 2-3 conv layers
|
||||
//! - Three fully connected layers of 4096, 4096, 1000 units
|
||||
//! - ReLU activation and dropout
|
||||
//!
|
||||
//! References:
|
||||
//! - [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556)
|
||||
//!
|
||||
|
||||
use candle::{ModuleT, Result, Tensor};
|
||||
use candle_nn::{FuncT, VarBuilder};
|
||||
|
||||
|
|
|
@ -1,3 +1,20 @@
|
|||
//! Vision Transformer (ViT) implementation.
|
||||
//!
|
||||
//! Vision Transformer applies transformer architecture to image classification
|
||||
//! by splitting images into patches and processing them as a sequence.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Image patches as sequence tokens
|
||||
//! - Self-attention between patches
|
||||
//! - Position embeddings
|
||||
//! - CLS token for classification
|
||||
//! - Layer normalization
|
||||
//!
|
||||
//! References:
|
||||
//! - [ViT Paper](https://arxiv.org/abs/2010.11929)
|
||||
//! - [Model Card](https://huggingface.co/google/vit-base-patch16-224)
|
||||
//!
|
||||
|
||||
use crate::models::with_tracing::{conv2d, linear, linear_no_bias, Conv2d, Linear};
|
||||
use candle::{IndexOp, Module, Result, Tensor, D};
|
||||
use candle_nn::{layer_norm, LayerNorm, VarBuilder};
|
||||
|
|
|
@ -1,3 +1,11 @@
|
|||
//! Whisper Model Implementation
|
||||
//!
|
||||
//! Whisper is an automatic speech recognition (ASR) system trained on large amounts
|
||||
//! of multilingual and multitask supervised data collected from the web.
|
||||
//!
|
||||
//! - [GH Link](https://github.com/openai/whisper)
|
||||
//! - Transformers Python [reference implementation](https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py)
|
||||
//!
|
||||
pub mod audio;
|
||||
pub mod model;
|
||||
pub mod quantized_model;
|
||||
|
|
|
@ -1,3 +1,12 @@
|
|||
//! Würstchen Efficient Diffusion Model
|
||||
//!
|
||||
//! Würstchen is an efficient diffusion model architecture for generating images using
|
||||
//! a two-stage approach with a small decoder and prior network.
|
||||
//!
|
||||
//! - [Paper Link](https://openreview.net/pdf?id=gU58AyJlYz)
|
||||
//! - [GH Link](https://github.com/dome272/Wuerstchen)
|
||||
//! - [Reference Implementation](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/wuerstchen/pipeline_wuerstchen.py)
|
||||
//!
|
||||
pub mod attention_processor;
|
||||
pub mod common;
|
||||
pub mod ddpm;
|
||||
|
|
|
@ -1,4 +1,18 @@
|
|||
/// https://huggingface.co/01-ai/Yi-6B/blob/main/modeling_yi.py
|
||||
//! Yi model implementation.
|
||||
//!
|
||||
//! Yi is a decoder-only large language model trained by 01.AI.
|
||||
//! It follows a standard transformer architecture similar to Llama.
|
||||
//!
|
||||
//! Key characteristics:
|
||||
//! - Multi-head attention with rotary positional embeddings
|
||||
//! - RMS normalization
|
||||
//! - SwiGLU activation in feed-forward layers
|
||||
//! - Grouped-query attention for efficient inference
|
||||
//!
|
||||
//! References:
|
||||
//! - [Yi Model](https://huggingface.co/01-ai/Yi-6B)
|
||||
//! - [Hugging Face](https://huggingface.co/01-ai/Yi-6B/blob/main/modeling_yi.py)
|
||||
|
||||
use crate::models::with_tracing::{linear_no_bias, Linear, RmsNorm};
|
||||
use candle::{DType, Device, Module, Result, Tensor, D};
|
||||
use candle_nn::{Activation, VarBuilder};
|
||||
|
|
Loading…
Reference in New Issue