Embedding Layer
The Embedding layer converts integer indices into dense vector representations. It's commonly used as the first layer in models that process text, categorical data, or any discrete input that needs to be mapped to a continuous vector space.
Definition
#![allow(unused)] fn main() { pub struct Embedding { weight: Tensor, padding_idx: Option<usize>, max_norm: Option<f32>, norm_type: f32, scale_grad_by_freq: bool, state: LayerState, } }
Constructor
#![allow(unused)] fn main() { pub fn new(num_embeddings: usize, embedding_dim: usize) -> Result<Self> }
Creates a new Embedding layer with the specified dimensions.
Parameters:
num_embeddings
: Size of the vocabulary (number of possible indices)embedding_dim
: Size of each embedding vector
Example:
#![allow(unused)] fn main() { let embedding = Embedding::new(10000, 300)?; // 10,000 words, 300-dimensional embeddings }
For more control over the initialization and additional features, you can use the extended constructor:
#![allow(unused)] fn main() { pub fn new_with_spec( num_embeddings: usize, embedding_dim: usize, padding_idx: Option<usize>, max_norm: Option<f32>, norm_type: f32, scale_grad_by_freq: bool, device: Device, dtype: DType ) -> Result<Self> }
Additional Parameters:
padding_idx
: If specified, entries at this index will be filled with zerosmax_norm
: If specified, embeddings will be normalized to have at most this normnorm_type
: The p-norm to use for normalization (default: 2.0)scale_grad_by_freq
: If true, gradients are scaled by the inverse of frequency of the wordsdevice
: The device to place the layer's parameters on (CPU, CUDA, or MPS)dtype
: The data type for the layer's parameters
Example:
#![allow(unused)] fn main() { let embedding = Embedding::new_with_spec( 10000, 300, Some(0), // Index 0 is padding Some(5.0), // Maximum norm of 5.0 2.0, // L2 norm true, // Scale gradients by frequency Device::CUDA(0), DType::F32 )?; }
Forward Pass
#![allow(unused)] fn main() { pub fn forward(&self, input: &Tensor) -> Result<Tensor> }
Retrieves embeddings for the given indices.
Parameters:
input
: Tensor of integer indices with any shape, dtype must be an integer type
Returns: Tensor of embeddings with shape [*input.shape, embedding_dim]
Example:
#![allow(unused)] fn main() { let embedding = Embedding::new(10, 5)?; let indices = Tensor::new(vec![1, 3, 5, 7])?; let embeddings = embedding.forward(&indices)?; // Shape: [4, 5] // With batch dimension let batch_indices = Tensor::new(vec![1, 3, 5, 7, 2, 4, 6, 8])?.reshape(&[2, 4])?; let batch_embeddings = embedding.forward(&batch_indices)?; // Shape: [2, 4, 5] }
Parameter Access
#![allow(unused)] fn main() { pub fn weight(&self) -> &Tensor }
Provides access to the embedding matrix.
Example:
#![allow(unused)] fn main() { let embedding = Embedding::new(10, 5)?; let weights = embedding.weight(); // Shape: [10, 5] }
Other accessor methods include:
num_embeddings()
embedding_dim()
padding_idx()
max_norm()
norm_type()
scale_grad_by_freq()
Layer Implementation
The Embedding layer implements the Layer
trait, providing methods for parameter collection and training state management:
#![allow(unused)] fn main() { pub fn parameters(&mut self) -> Vec<&mut Tensor> }
Returns the embedding matrix as a trainable parameter.
Common Use Cases
Word Embeddings
#![allow(unused)] fn main() { // 10,000 words in vocabulary, 300-dim embeddings let word_embedding = Embedding::new(10000, 300)?; // Convert word indices to embeddings let word_indices = Tensor::new(vec![42, 1337, 7, 42])?; let word_vectors = word_embedding.forward(&word_indices)?; // Shape: [4, 300] }
One-Hot to Dense
#![allow(unused)] fn main() { // For 10 categorical values let category_embedding = Embedding::new(10, 5)?; // Convert category indices to embeddings let categories = Tensor::new(vec![0, 3, 9, 2])?; let category_vectors = category_embedding.forward(&categories)?; // Shape: [4, 5] }
With Padding
#![allow(unused)] fn main() { // Use index 0 as padding let embedding = Embedding::new_with_spec(1000, 64, Some(0), None, 2.0, false, Device::CPU, DType::F32)?; // Input with padding let input = Tensor::new(vec![0, 1, 2, 0, 3, 0])?; let embeddings = embedding.forward(&input)?; // Embeddings for index 0 will be all zeros }
With Maximum Norm
#![allow(unused)] fn main() { // Limit embedding norms to 1.0 let embedding = Embedding::new_with_spec(1000, 64, None, Some(1.0), 2.0, false, Device::CPU, DType::F32)?; // All retrieved embeddings will have norm <= 1.0 let input = Tensor::new(vec![5, 10, 15])?; let embeddings = embedding.forward(&input)?; }
Implementation Notes
- The embedding matrix is initialized with random values scaled by
1/sqrt(embedding_dim)
- If
padding_idx
is specified, the corresponding embedding is filled with zeros - If
max_norm
is specified, embeddings are normalized to have at most this norm - If
scale_grad_by_freq
is true, gradients are scaled by the inverse frequency of the words in the mini-batch