Introduction
What is MaidenX?
MaidenX is a Rust-based machine learning framework developed as part of the Maiden Engine project. It is designed with an educational focus, structured to mirror PyTorch's architecture to facilitate learning and understanding of ML framework implementations. The library prioritizes code readability, ensuring that anyone can easily understand and work with the codebase.
Key Features
- Pure Rust Implementation: Built entirely in Rust, providing memory safety, concurrency, and performance benefits.
- PyTorch-like API: Familiar and intuitive API design for those coming from PyTorch.
- Multiple Backends: Support for CPU, CUDA (NVIDIA GPUs), and MPS (Apple Silicon) computation.
- Automatic Differentiation: Built-in autograd system for gradient-based optimization.
- Computational Graph: Optional computational graph mode for deferred execution.
- Serialization: Integration with Rust's serde framework for model saving and loading.
- Comprehensive Operations: Rich set of tensor operations with autograd support.
- Neural Network Layers: Ready-to-use implementations of common neural network components.
Architecture
MaidenX is organized into several core components:
1. Tensor System (maidenx_tensor
)
The tensor module provides the foundation for all numerical operations. Key features include:
- Support for multiple data types (float32, float64, int32, etc.)
- Comprehensive tensor operations (arithmetic, transformation, reduction)
- Automatic broadcasting for compatible shapes
- In-place and out-of-place operations
- Efficient memory management and buffer handling
2. Neural Network Components (maidenx_nn
)
The neural network module offers building blocks for constructing machine learning models:
- Common layers: Linear, Conv2d, LayerNorm, Dropout, Embedding
- Activation functions: ReLU, Sigmoid, Tanh, GELU, Softmax, etc.
- Loss functions: MSE, MAE, Huber, CrossEntropy
- Optimizers: SGD, Adam
3. Backend System (maidenx_core
)
The core backend system provides device-specific implementations:
- CPU backend for universal compatibility
- CUDA backend for NVIDIA GPU acceleration
- MPS backend for Apple Silicon GPU acceleration
- Abstract device interface for consistent API across backends
Getting Started
MaidenX organizes its functionality into separate features, allowing users to select only what they need:
Default Features
These are included by default and recommended for most use cases:
- nn: Core neural network functionality
- serde: Serialization/deserialization support
- graph: Computational graph mode for deferred operations
Optional Features
- cuda: GPU acceleration support using NVIDIA CUDA
- mps: Apple Metal Performance Shaders support for Apple Silicon
Example Usage
Here's a simple example of training a linear model with MaidenX:
use maidenx::nn::*; use maidenx::prelude::*; use std::time::Instant; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create input and target data let input_data: Vec<Vec<f32>> = (0..10000) .map(|i| vec![(i % 100) as f32 / 100.0, ((i % 100) + 1) as f32 / 100.0, ((i % 100) + 2) as f32 / 100.0]) .collect(); let target_data: Vec<Vec<f32>> = (0..10000).map(|i| vec![((i % 100) * 10) as f32 / 1000.0]).collect(); let mut input = Tensor::new(input_data)?; let target = Tensor::new(target_data)?; input.with_grad()?; // Create model, loss function, and optimizer let mut linear = Linear::new(3, 1, true)?; let mse_loss = MSE::new(); let mut optimizer = SGD::new(0.01); let epochs = 1000; // Training loop for epoch in 0..epochs { let pred = linear.forward(&input)?; let loss = mse_loss.forward((&pred, &target))?; loss.backward()?; optimizer.step(&mut linear.parameters())?; optimizer.zero_grad(&mut linear.parameters())?; if (epoch + 1) % 100 == 0 { println!("Epoch {}: Loss = {}", epoch + 1, loss); } } Ok(()) }
Supported Operations and Layers
MaidenX includes a comprehensive set of tensor operations and neural network layers, which we'll explore in more detail in the following chapters.
Installation
MaidenX is a Rust machine learning framework that's available through crates.io. This guide will walk you through the installation process, including setting up optional hardware acceleration features.
Basic Installation
To add MaidenX to your Rust project, add it as a dependency in your Cargo.toml
file:
[dependencies]
maidenx = "*"
This will include the default features (nn
, serde
, and graph
), which are suitable for most use cases.
Feature Configuration
MaidenX provides several optional features that you can enable based on your needs:
Default Features
These are included automatically and provide core functionality:
Feature | Description |
---|---|
nn | Neural network components (layers, optimizers, activations) |
serde | Serialization/deserialization for saving and loading models |
graph | Computational graph for deferred tensor operations |
Hardware Acceleration
For improved performance, you can enable hardware-specific backends:
Feature | Description | Requirements |
---|---|---|
cuda | NVIDIA GPU acceleration | NVIDIA GPU, CUDA toolkit |
mps | Apple Silicon GPU acceleration | Apple Silicon Mac |
To enable specific features, modify your dependency in Cargo.toml
:
[dependencies]
maidenx = { version = "*", features = ["cuda"] } # For NVIDIA GPU support
Or:
[dependencies]
maidenx = { version = "*", features = ["mps"] } # For Apple Silicon GPU support
Hardware-Specific Setup
CUDA Backend (NVIDIA GPUs)
To use the CUDA backend:
- Install the CUDA Toolkit (compatible with your NVIDIA GPU)
- Ensure your system's PATH includes the CUDA binaries
- Enable the
cuda
feature in your Cargo.toml
MPS Backend (Apple Silicon)
To use the Metal Performance Shaders backend:
- Ensure you're using macOS on Apple Silicon hardware (M1/M2/M3)
- Have Xcode and the Command Line Tools installed
- Enable the
mps
feature in your Cargo.toml
Setting Default Device and Data Type
MaidenX allows you to configure the global default device and data type for tensor operations:
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Check current default device and dtype println!("Default device: {:?}", get_default_device()); println!("Default dtype: {:?}", get_default_dtype()); // Set new defaults set_default_device(Device::CPU); set_default_dtype(DType::F32); // Create a tensor using the defaults let tensor = Tensor::ones(&[2, 3])?; println!("Device: {:?}, dtype: {:?}", tensor.device(), tensor.dtype()); // Automatic device selection based on available hardware auto_set_device(); println!("Auto-selected device: {:?}", get_default_device()); Ok(()) }
The auto_set_device()
function will select the best available device in this order:
- CUDA if available and the
cuda
feature is enabled - MPS if available and the
mps
feature is enabled - CPU as fallback
Verifying Installation
To verify that MaidenX is correctly installed and configured, you can run a simple example:
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create a simple tensor let tensor = Tensor::ones(&[2, 3])?; println!("Tensor shape: {:?}", tensor.shape()); println!("Tensor device: {:?}", tensor.device()); Ok(()) }
If you've enabled hardware acceleration, you can explicitly create tensors on specific devices:
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create tensors on different devices let cpu_tensor = Tensor::ones(&[2, 3])?.to_device(Device::CPU)?; #[cfg(feature = "cuda")] let cuda_tensor = Tensor::ones(&[2, 3])?.to_device(Device::CUDA(0))?; #[cfg(feature = "mps")] let mps_tensor = Tensor::ones(&[2, 3])?.to_device(Device::MPS)?; println!("CPU Tensor: {:?}", cpu_tensor); Ok(()) }
Next Steps
Once you've successfully installed MaidenX, you're ready to start creating and manipulating tensors. Continue to the Creating Tensors guide to learn the basics of working with MaidenX's tensor system.
Create Tensors
MaidenX provides a variety of methods for creating and initializing tensors. This guide covers the most common tensor creation patterns and provides examples for each method.
Creating Tensors from Data
The most direct way to create a tensor is from existing data using the new
method:
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create a tensor from a vector of values let vec_data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0]; let tensor = Tensor::new(vec_data)?; println!("1D Tensor: {}", tensor); // Create a tensor from a 2D vector (matrix) let matrix_data = vec![ vec![1.0, 2.0, 3.0], vec![4.0, 5.0, 6.0] ]; let matrix = Tensor::new(matrix_data)?; println!("2D Tensor shape: {:?}", matrix.shape()); println!("2D Tensor: {}", matrix); Ok(()) }
Creating Tensors with Specific Device and Data Type
You can explicitly specify the device and data type when creating a tensor:
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create a tensor with specific device and data type let data = vec![1, 2, 3, 4]; let tensor = Tensor::new_with_spec(data, Device::CPU, DType::I32)?; println!("Device: {:?}", tensor.device()); println!("Data type: {:?}", tensor.dtype()); Ok(()) }
Creating Pre-initialized Tensors
MaidenX provides several factory methods for creating tensors with pre-initialized values:
Zeros and Ones
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create a tensor filled with zeros let zeros = Tensor::zeros(&[2, 3])?; println!("Zeros tensor: {}", zeros); // Create a tensor filled with ones let ones = Tensor::ones(&[2, 3])?; println!("Ones tensor: {}", ones); // Create a tensor filled with a specific value let filled = Tensor::fill(&[2, 3], 5.0)?; println!("Filled tensor: {}", filled); Ok(()) }
Random Tensors
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create a tensor with random values from a normal distribution let random = Tensor::randn(&[2, 3])?; println!("Random tensor: {}", random); Ok(()) }
Creating Range Tensors
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create a tensor with values [0, 1, 2, 3, 4] let range = Tensor::range(5)?; println!("Range tensor: {}", range); // Create a tensor with custom range [1, 3, 5, 7, 9] let arange = Tensor::arange(1, 11, 2)?; println!("Arange tensor: {}", arange); Ok(()) }
Creating Tensors Based on Existing Tensors
MaidenX follows PyTorch's pattern of providing _like
methods that create new tensors with the same properties as existing ones:
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create a source tensor let source = Tensor::randn(&[2, 3])?.to_device(Device::CPU)?.with_dtype(DType::F32)?; // Create tensors with the same properties as source let zeros_like = Tensor::zeros_like(&source)?; let ones_like = Tensor::ones_like(&source)?; let randn_like = Tensor::randn_like(&source)?; let empty_like = Tensor::empty_like(&source)?; println!("Source shape: {:?}, device: {:?}, dtype: {:?}", source.shape(), source.device(), source.dtype()); println!("Zeros like shape: {:?}, device: {:?}, dtype: {:?}", zeros_like.shape(), zeros_like.device(), zeros_like.dtype()); Ok(()) }
Configuring Device and Data Type Settings
MaidenX provides functions to manage the default device and data type settings for tensor creation:
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Get current default settings println!("Default device: {:?}", get_default_device()); println!("Default dtype: {:?}", get_default_dtype()); // Set new defaults set_default_device(Device::CPU); set_default_dtype(DType::F32); // All tensors created after this will use these defaults let tensor = Tensor::ones(&[2, 3])?; println!("Device: {:?}, dtype: {:?}", tensor.device(), tensor.dtype()); // Auto-select the best available device auto_set_device(); println!("Auto-selected device: {:?}", get_default_device()); // Create a tensor using the auto-selected device let auto_tensor = Tensor::ones(&[2, 3])?; println!("Auto tensor device: {:?}", auto_tensor.device()); Ok(()) }
When you call auto_set_device()
, MaidenX will:
- Check for CUDA availability if the
cuda
feature is enabled - Check for MPS availability if the
mps
feature is enabled - Fall back to CPU if no accelerated device is available
Hardware Acceleration
If you have enabled the appropriate features, you can create tensors directly on GPU:
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create a tensor on CUDA device (requires "cuda" feature) #[cfg(feature = "cuda")] let cuda_tensor = Tensor::ones_with_spec(&[2, 3], Device::CUDA(0), DType::F32)?; // Create a tensor on MPS device (requires "mps" feature) #[cfg(feature = "mps")] let mps_tensor = Tensor::ones_with_spec(&[2, 3], Device::MPS, DType::F32)?; Ok(()) }
Moving Tensors Between Devices
You can move tensors between devices after creation:
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create a tensor on CPU let cpu_tensor = Tensor::ones(&[2, 3])?; // Move to CUDA device if available #[cfg(feature = "cuda")] let cuda_tensor = cpu_tensor.to_device(Device::CUDA(0))?; // Move to MPS device if available #[cfg(feature = "mps")] let mps_tensor = cpu_tensor.to_device(Device::MPS)?; Ok(()) }
Enabling Autograd
To make a tensor track gradients for automatic differentiation, use the with_grad
method:
use maidenx::prelude::*; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create a tensor that requires gradients let mut tensor = Tensor::randn(&[2, 3])?; tensor.with_grad()?; println!("Requires grad: {}", tensor.requires_grad()); Ok(()) }
Tensor
This chapter covers MaidenX's core tensor functionality, which provides multi-dimensional array operations with automatic differentiation (autograd) support.
Core Tensor Features
MaidenX tensors provide:
- Multi-dimensional array representation
- Support for various data types (F32, F64, I32, I64, etc.)
- Automatic differentiation for gradient-based optimization
- Device support (CPU, CUDA, MPS)
- Broadcasting for performing operations between tensors of different shapes
- Extensive operation library (arithmetic, reduction, transformation, etc.)
Tensor Structure
The Tensor
struct is the primary data structure for representing multi-dimensional arrays:
#![allow(unused)] fn main() { pub struct Tensor { data: TensorData, // Holds buffer and gradient information metadata: TensorMetadata, // Holds device, dtype, layout, requires_grad node: Option<TensorNode>, // Stores computational graph information for autograd } }
Display and Debug Output
MaidenX tensors implement both the Display
and Debug
traits for convenient printing:
Display Format
The Display format shows just the tensor's data in a nested array format:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; println!("{}", a); // Outputs: [1.00000000, 2.00000000, 3.00000000] let b = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; println!("{}", b); // Outputs: [[1.00000000, 2.00000000], [3.00000000, 4.00000000]] }
Debug Format
The Debug format provides comprehensive information about the tensor, including shape, device, data type, data values, and gradient information:
#![allow(unused)] fn main() { let mut a = Tensor::new(vec![1.0, 2.0, 3.0])?; a.with_grad()?; println!("{:?}", a); // Outputs something like: // Tensor(shape=[3], device=cpu, dtype=f32, data=[1.00000000, 2.00000000, 3.00000000], requires_grad=true, grad=[0.00000000, 0.00000000, 0.00000000]) }
Serialization and Deserialization
MaidenX supports tensor serialization and deserialization through Serde (when the "serde" feature is enabled):
#![allow(unused)] fn main() { // Binary serialization let tensor = Tensor::new(vec![1.0, 2.0, 3.0])?; let bytes = tensor.to_bytes()?; let tensor_from_bytes = Tensor::from_bytes(&bytes)?; // JSON serialization let json = tensor.to_json()?; let tensor_from_json = Tensor::from_json(&json)?; }
The serialization preserves:
- Tensor data
- Shape and layout information
- Device information
- Data type
- Requires grad flag (but not gradient values)
However, computational graph information (the node
field) is not serialized, so autograd history is not preserved.
Getting Started
For detailed guides on tensor operations, see the following sections:
- Tensor Creation: Ways to create and initialize tensors
- Tensor Operations: Overview of tensor operations
- Tensor Utilities: Utility functions for tensor manipulation
Creation
MaidenX provides multiple ways to create tensors. This section covers the various tensor creation methods available in the library.
Creating Tensors from Data
new
Creates a new tensor from data using default device and data type.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; // Create a tensor from a vector of integers let x = Tensor::new(vec![1, 2, 3])?; }
new_with_spec
Creates a new tensor with specified device and data type. This is useful when you need more control over the tensor's properties.
#![allow(unused)] fn main() { use maidenx_core::{device::Device, dtype::DType}; use maidenx_tensor::Tensor; // Create a tensor with specific device and data type let x = Tensor::new_with_spec(vec![1, 2, 3], Device::CPU, DType::I32)?; // Create a tensor with type conversion (integers to float) let y = Tensor::new_with_spec(vec![1, 2, 3], Device::CPU, DType::F32)?; assert_eq!(y.to_flatten_vec::<f32>()?, [1.0, 2.0, 3.0]); }
Creating Empty Tensors
empty
Creates an uninitialized tensor with the specified shape, using default device and data type.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; // Create an empty tensor of shape [2, 3] let x = Tensor::empty(&[2, 3])?; }
empty_like
Creates an uninitialized tensor with the same shape, device, and data type as the provided tensor.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; let src = Tensor::new(vec![1, 2, 3, 4, 5, 6])?; src.with_shape(&[2, 3])?; // Create an empty tensor with same properties as src let y = Tensor::empty_like(&src)?; }
empty_with_spec
Creates an uninitialized tensor with the specified shape, device, and data type.
#![allow(unused)] fn main() { use maidenx_core::{device::Device, dtype::DType}; use maidenx_tensor::Tensor; // Create an empty tensor with specific shape, device and data type let x = Tensor::empty_with_spec(&[2, 3], Device::CPU, DType::F32)?; }
Creating Tensors with Constant Values
zeros
Creates a tensor of the specified shape filled with zeros, using default device and data type.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; // Create a tensor of shape [2, 3] filled with zeros let x = Tensor::zeros(&[2, 3])?; assert_eq!(x.to_flatten_vec::<f32>()?, vec![0.0; 6]); }
zeros_like
Creates a tensor filled with zeros with the same shape, device, and data type as the provided tensor.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; let src = Tensor::new(vec![1, 2, 3, 4, 5, 6])?; src.with_shape(&[2, 3])?; // Create a tensor of zeros with same properties as src let y = Tensor::zeros_like(&src)?; }
zeros_with_spec
Creates a tensor filled with zeros with the specified shape, device, and data type.
#![allow(unused)] fn main() { use maidenx_core::{device::Device, dtype::DType}; use maidenx_tensor::Tensor; // Create a zeros tensor with specific shape, device and data type let x = Tensor::zeros_with_spec(&[2, 3], Device::CPU, DType::F32)?; }
ones
Creates a tensor of the specified shape filled with ones, using default device and data type.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; // Create a tensor of shape [2, 3] filled with ones let x = Tensor::ones(&[2, 3])?; assert_eq!(x.to_flatten_vec::<f32>()?, vec![1.0; 6]); }
ones_like
Creates a tensor filled with ones with the same shape, device, and data type as the provided tensor.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; let src = Tensor::new(vec![1, 2, 3, 4, 5, 6])?; src.with_shape(&[2, 3])?; // Create a tensor of ones with same properties as src let y = Tensor::ones_like(&src)?; }
ones_with_spec
Creates a tensor filled with ones with the specified shape, device, and data type.
#![allow(unused)] fn main() { use maidenx_core::{device::Device, dtype::DType}; use maidenx_tensor::Tensor; // Create a ones tensor with specific shape, device and data type let x = Tensor::ones_with_spec(&[2, 3], Device::CPU, DType::F32)?; }
fill
Creates a tensor of the specified shape filled with a specified value, using default device and data type.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; // Create a tensor of shape [2, 3] filled with value 5.0 let x = Tensor::fill(&[2, 3], 5.0)?; assert_eq!(x.to_flatten_vec::<f32>()?, vec![5.0; 6]); }
fill_like
Creates a tensor filled with a specified value with the same shape, device, and data type as the provided tensor.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; let src = Tensor::new(vec![1, 2, 3, 4, 5, 6])?; src.with_shape(&[2, 3])?; // Create a tensor filled with 7 with same properties as src let y = Tensor::fill_like(&src, 7)?; }
fill_with_spec
Creates a tensor filled with a specified value with the specified shape, device, and data type.
#![allow(unused)] fn main() { use maidenx_core::{device::Device, dtype::DType}; use maidenx_tensor::Tensor; // Create a tensor filled with 5.0 with specific shape, device and data type let x = Tensor::fill_with_spec(&[2, 3], 5.0, Device::CPU, DType::F32)?; }
Creating Tensors with Random Values
randn
Creates a tensor of the specified shape filled with values sampled from a standard normal distribution (mean = 0, std = 1), using default device and data type.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; // Create a tensor of shape [2, 3] with random normal values let x = Tensor::randn(&[2, 3])?; }
randn_like
Creates a tensor filled with values sampled from a standard normal distribution with the same shape, device, and data type as the provided tensor.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; let src = Tensor::new(vec![1, 2, 3, 4, 5, 6])?; src.with_shape(&[2, 3])?; // Create a tensor with random normal values with same properties as src let y = Tensor::randn_like(&src)?; }
randn_with_spec
Creates a tensor filled with values sampled from a standard normal distribution with the specified shape, device, and data type.
#![allow(unused)] fn main() { use maidenx_core::{device::Device, dtype::DType}; use maidenx_tensor::Tensor; // Create a random normal tensor with specific shape, device and data type let x = Tensor::randn_with_spec(&[2, 3], Device::CPU, DType::F32)?; }
Creating Sequences
range
Creates a 1D tensor with values [0, 1, 2, ..., n-1].
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; // Create a tensor with values [0, 1, 2, 3, 4] let x = Tensor::range(5)?; assert_eq!(x.to_flatten_vec::<f32>()?, vec![0.0, 1.0, 2.0, 3.0, 4.0]); }
range_with_spec
Creates a 1D tensor with values [0, 1, 2, ..., n-1] with the specified device and data type.
#![allow(unused)] fn main() { use maidenx_core::{device::Device, dtype::DType}; use maidenx_tensor::Tensor; // Create a range tensor with specific device and data type let x = Tensor::range_with_spec(5, Device::CPU, DType::I32)?; assert_eq!(x.to_flatten_vec::<i32>()?, vec![0, 1, 2, 3, 4]); }
arange
Creates a 1D tensor with values [start, start+step, start+2*step, ...] up to but not including end.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; // Create a tensor with values [1.0, 2.0, 3.0, 4.0] let x = Tensor::arange(1.0, 5.0, 1.0)?; // Create a tensor with values [0.0, 2.0, 4.0, 6.0, 8.0] let y = Tensor::arange(0, 10, 2)?; // Create a tensor with negative step [5.0, 4.0, 3.0, 2.0, 1.0] let z = Tensor::arange(5, 0, -1)?; }
arange_with_spec
Creates a 1D tensor with values [start, start+step, start+2*step, ...] up to but not including end, with the specified device and data type.
#![allow(unused)] fn main() { use maidenx_core::{device::Device, dtype::DType}; use maidenx_tensor::Tensor; // Create an arange tensor with specific device and data type let x = Tensor::arange_with_spec(1, 5, 1, Device::CPU, DType::I32)?; assert_eq!(x.to_flatten_vec::<i32>()?, vec![1, 2, 3, 4]); // Create a tensor with float values [0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5] let y = Tensor::arange_with_spec(0.5, 4.0, 0.5, Device::CPU, DType::F32)?; }
Other Creation Methods
share_buffer
Creates a new tensor that shares the underlying buffer with the provided tensor.
#![allow(unused)] fn main() { use maidenx_tensor::Tensor; let x = Tensor::new(vec![1, 2, 3, 4])?; let y = Tensor::share_buffer(&x)?; // Both tensors share the same buffer assert_eq!(y.to_flatten_vec::<i32>()?, [1, 2, 3, 4]); }
Creation Pattern
MaidenX follows a consistent pattern for tensor creation functions:
-
Basic function: Takes minimal arguments and uses default device and data type
#![allow(unused)] fn main() { Tensor::zeros(&[2, 3])?; }
-
_like function: Creates a tensor with the same properties as another tensor
#![allow(unused)] fn main() { Tensor::zeros_like(&x)?; }
-
_with_spec function: Provides complete control over shape, device, and data type
#![allow(unused)] fn main() { Tensor::zeros_with_spec(&[2, 3], Device::CPU, DType::F32)?; }
This consistent pattern makes it easy to understand and use the various creation methods in MaidenX.
Tensor Operations
maidenx provides a comprehensive set of tensor operations for numerical computing and deep learning. This page provides an overview of the major operation categories.
Operation Categories
maidenx tensor operations are organized into the following categories:
Category | Description |
---|---|
Binary Operations | Operations between two tensors (add, mul, div, etc.) |
Unary Operations | Operations on a single tensor (neg, abs, exp, etc.) |
Reduction Operations | Operations that reduce tensor dimensions (sum, mean, max, etc.) |
Transform Operations | Operations that transform tensor shape or layout |
Padding Operations | Operations that add padding around tensor boundaries |
Indexing Operations | Operations for advanced indexing and selection |
Common Operation Features
Most operations in maidenx share these common features:
Automatic Differentiation (Autograd)
Many operations support automatic differentiation, which is crucial for training neural networks. Operations that support autograd will automatically track gradients when the tensor has requires_grad
enabled.
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?.with_grad()?; let b = a.mul_scalar(2.0)?; // 'b' will also have autograd enabled }
Type Promotion
When performing operations between tensors of different data types, maidenx automatically promotes types according to standard rules:
- If one tensor is floating point and one is integer, the integer tensor is converted to floating point
- When mixing different floating point precisions, the lower precision is promoted to the higher one
#![allow(unused)] fn main() { let a = Tensor::new(vec![1, 2, 3])?; // i32 tensor let b = Tensor::new(vec![1.0, 2.0, 3.0])?; // f32 tensor let c = a.add(&b)?; // Result will be f32 }
Broadcasting
Most operations support broadcasting, which allows operations between tensors of different shapes by implicitly expanding the smaller tensor:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = Tensor::new(vec![1.0])?; let c = a.add(&b)?; // [2.0, 3.0, 4.0] }
Error Handling
All operations return a Result
type, which allows for clear error handling:
#![allow(unused)] fn main() { match tensor.add(&other_tensor) { Ok(result) => println!("Addition successful"), Err(e) => println!("Error: {}", e), } }
Operation Examples
Here are some common operation examples:
#![allow(unused)] fn main() { // Create some tensors let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = Tensor::new(vec![5.0, 6.0, 7.0, 8.0])?.reshape(&[2, 2])?; // Binary operations let sum = a.add(&b)?; // [[6.0, 8.0], [10.0, 12.0]] let product = a.mul(&b)?; // [[5.0, 12.0], [21.0, 32.0]] // Unary operations let neg_a = a.neg()?; // [[-1.0, -2.0], [-3.0, -4.0]] let exp_a = a.exp()?; // Element-wise exponential // Reduction operations let sum_a = a.sum(0, false)?; // [4.0, 6.0] let max_a = a.max_all()?; // 4.0 // Transform operations let reshaped = a.reshape(&[4])?; // [1.0, 2.0, 3.0, 4.0] let transposed = a.transpose(0, 1)?; // [[1.0, 3.0], [2.0, 4.0]] // Indexing operations let indices = Tensor::new(vec![0])?.reshape(&[1])?; let first_row = a.index_select(0, &indices)?; // [[1.0, 2.0]] }
For detailed documentation on each operation category, please refer to the specific section pages linked above.
Binary Operations
Binary operations in maidenx are operations that take two tensors as input and produce a single output tensor. These operations typically apply the specified mathematical operation element-wise between corresponding elements of the input tensors.
Arithmetic Operations
add
#![allow(unused)] fn main() { fn add(&self, rhs: &Tensor) -> Result<Tensor> }
Performs element-wise addition between two tensors.
- Parameters:
rhs
: The tensor to add to the current tensor
- Returns: A new tensor containing the sum of the elements
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = Tensor::new(vec![4.0, 5.0, 6.0])?; let c = a.add(&b)?; // [5.0, 7.0, 9.0] }
sub
#![allow(unused)] fn main() { fn sub(&self, rhs: &Tensor) -> Result<Tensor> }
Performs element-wise subtraction between two tensors.
- Parameters:
rhs
: The tensor to subtract from the current tensor
- Returns: A new tensor containing the difference of the elements
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![5.0, 7.0, 9.0])?; let b = Tensor::new(vec![1.0, 2.0, 3.0])?; let c = a.sub(&b)?; // [4.0, 5.0, 6.0] }
mul
#![allow(unused)] fn main() { fn mul(&self, rhs: &Tensor) -> Result<Tensor> }
Performs element-wise multiplication between two tensors.
- Parameters:
rhs
: The tensor to multiply with the current tensor
- Returns: A new tensor containing the product of the elements
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = Tensor::new(vec![4.0, 5.0, 6.0])?; let c = a.mul(&b)?; // [4.0, 10.0, 18.0] }
div
#![allow(unused)] fn main() { fn div(&self, rhs: &Tensor) -> Result<Tensor> }
Performs element-wise division between two tensors.
- Parameters:
rhs
: The tensor to divide the current tensor by
- Returns: A new tensor containing the quotient of the elements
- Supports Autograd: Yes
- Note: When dividing integer tensors, the result will be promoted to F32
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![4.0, 10.0, 18.0])?; let b = Tensor::new(vec![4.0, 5.0, 6.0])?; let c = a.div(&b)?; // [1.0, 2.0, 3.0] }
maximum
#![allow(unused)] fn main() { fn maximum(&self, rhs: &Tensor) -> Result<Tensor> }
Takes the element-wise maximum of two tensors.
- Parameters:
rhs
: The tensor to compare with the current tensor
- Returns: A new tensor containing the maximum value at each position
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 5.0, 3.0])?; let b = Tensor::new(vec![4.0, 2.0, 6.0])?; let c = a.maximum(&b)?; // [4.0, 5.0, 6.0] }
minimum
#![allow(unused)] fn main() { fn minimum(&self, rhs: &Tensor) -> Result<Tensor> }
Takes the element-wise minimum of two tensors.
- Parameters:
rhs
: The tensor to compare with the current tensor
- Returns: A new tensor containing the minimum value at each position
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 5.0, 3.0])?; let b = Tensor::new(vec![4.0, 2.0, 6.0])?; let c = a.minimum(&b)?; // [1.0, 2.0, 3.0] }
Logical Operations
logical_and
#![allow(unused)] fn main() { fn logical_and(&self, rhs: &Tensor) -> Result<Tensor> }
Performs element-wise logical AND between two tensors.
- Parameters:
rhs
: The tensor to combine with the current tensor
- Returns: A new boolean tensor with logical AND applied
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![true, false, true])?; let b = Tensor::new(vec![true, true, false])?; let c = a.logical_and(&b)?; // [true, false, false] }
logical_or
#![allow(unused)] fn main() { fn logical_or(&self, rhs: &Tensor) -> Result<Tensor> }
Performs element-wise logical OR between two tensors.
- Parameters:
rhs
: The tensor to combine with the current tensor
- Returns: A new boolean tensor with logical OR applied
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![true, false, true])?; let b = Tensor::new(vec![true, true, false])?; let c = a.logical_or(&b)?; // [true, true, true] }
logical_xor
#![allow(unused)] fn main() { fn logical_xor(&self, rhs: &Tensor) -> Result<Tensor> }
Performs element-wise logical XOR between two tensors.
- Parameters:
rhs
: The tensor to combine with the current tensor
- Returns: A new boolean tensor with logical XOR applied
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![true, false, true])?; let b = Tensor::new(vec![true, true, false])?; let c = a.logical_xor(&b)?; // [false, true, true] }
Comparison Operations
eq
#![allow(unused)] fn main() { fn eq(&self, rhs: &Tensor) -> Result<Tensor> }
Compares two tensors for element-wise equality.
- Parameters:
rhs
: The tensor to compare with the current tensor
- Returns: A new boolean tensor with true where elements are equal
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = Tensor::new(vec![1.0, 5.0, 3.0])?; let c = a.eq(&b)?; // [true, false, true] }
ne
#![allow(unused)] fn main() { fn ne(&self, rhs: &Tensor) -> Result<Tensor> }
Compares two tensors for element-wise inequality.
- Parameters:
rhs
: The tensor to compare with the current tensor
- Returns: A new boolean tensor with true where elements are not equal
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = Tensor::new(vec![1.0, 5.0, 3.0])?; let c = a.ne(&b)?; // [false, true, false] }
lt
#![allow(unused)] fn main() { fn lt(&self, rhs: &Tensor) -> Result<Tensor> }
Compares if elements in the first tensor are less than the corresponding elements in the second tensor.
- Parameters:
rhs
: The tensor to compare with the current tensor
- Returns: A new boolean tensor with true where elements are less than the corresponding elements
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = Tensor::new(vec![2.0, 2.0, 1.0])?; let c = a.lt(&b)?; // [true, false, false] }
le
#![allow(unused)] fn main() { fn le(&self, rhs: &Tensor) -> Result<Tensor> }
Compares if elements in the first tensor are less than or equal to the corresponding elements in the second tensor.
- Parameters:
rhs
: The tensor to compare with the current tensor
- Returns: A new boolean tensor with true where elements are less than or equal to the corresponding elements
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = Tensor::new(vec![2.0, 2.0, 1.0])?; let c = a.le(&b)?; // [true, true, false] }
gt
#![allow(unused)] fn main() { fn gt(&self, rhs: &Tensor) -> Result<Tensor> }
Compares if elements in the first tensor are greater than the corresponding elements in the second tensor.
- Parameters:
rhs
: The tensor to compare with the current tensor
- Returns: A new boolean tensor with true where elements are greater than the corresponding elements
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = Tensor::new(vec![2.0, 2.0, 1.0])?; let c = a.gt(&b)?; // [false, false, true] }
ge
#![allow(unused)] fn main() { fn ge(&self, rhs: &Tensor) -> Result<Tensor> }
Compares if elements in the first tensor are greater than or equal to the corresponding elements in the second tensor.
- Parameters:
rhs
: The tensor to compare with the current tensor
- Returns: A new boolean tensor with true where elements are greater than or equal to the corresponding elements
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = Tensor::new(vec![2.0, 2.0, 1.0])?; let c = a.ge(&b)?; // [false, true, true] }
Matrix Multiplication
matmul
#![allow(unused)] fn main() { fn matmul(&self, rhs: &Tensor) -> Result<Tensor> }
Performs matrix multiplication between two tensors.
- Parameters:
rhs
: The tensor to multiply with the current tensor
- Returns: A new tensor containing the result of matrix multiplication
- Supports Autograd: Yes
- Shape Rules:
- For 1D tensors (vectors): Returns the dot product as a scalar
- For 2D tensors (matrices): Standard matrix multiplication (M×K * K×N → M×N)
- For batched tensors: Applied to the last two dimensions with broadcasting
- If a tensor has only 1 dimension, it's treated as:
- 1D * 1D: Both are treated as vectors, resulting in a scalar (dot product)
- 1D * 2D: The 1D tensor is treated as a 1×K matrix, resulting in a 1×N vector
- 2D * 1D: The 1D tensor is treated as a K×1 matrix, resulting in a M×1 vector
- For N-D * M-D tensors: Leading dimensions are broadcast
- Examples:
#![allow(unused)] fn main() { // Basic matrix multiplication (2D * 2D) let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = Tensor::new(vec![5.0, 6.0, 7.0, 8.0])?.reshape(&[2, 2])?; let c = a.matmul(&b)?; // [[19.0, 22.0], [43.0, 50.0]] // Vector-vector multiplication (1D * 1D) let v1 = Tensor::new(vec![1.0, 2.0, 3.0])?; let v2 = Tensor::new(vec![4.0, 5.0, 6.0])?; let dot = v1.matmul(&v2)?; // [32.0] (dot product: 1*4 + 2*5 + 3*6) // Matrix-vector multiplication (2D * 1D) let m = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let v = Tensor::new(vec![5.0, 6.0])?; let mv = m.matmul(&v)?; // [17.0, 39.0] // Batched matrix multiplication with broadcasting let batch1 = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0])?.reshape(&[2, 2, 2])?; let batch2 = Tensor::new(vec![9.0, 10.0, 11.0, 12.0])?.reshape(&[1, 2, 2])?; let result = batch1.matmul(&batch2)?; // [[[29, 32], [67, 74]], [[65, 72], [159, 176]]] }
In-place Operations
add_
#![allow(unused)] fn main() { fn add_(&mut self, rhs: &Tensor) -> Result<()> }
Performs in-place element-wise addition.
- Parameters:
rhs
: The tensor to add to the current tensor
- Returns: Result indicating success or failure
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let mut a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = Tensor::new(vec![4.0, 5.0, 6.0])?; a.add_(&b)?; // a becomes [5.0, 7.0, 9.0] }
sub_
#![allow(unused)] fn main() { fn sub_(&mut self, rhs: &Tensor) -> Result<()> }
Performs in-place element-wise subtraction.
- Parameters:
rhs
: The tensor to subtract from the current tensor
- Returns: Result indicating success or failure
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let mut a = Tensor::new(vec![5.0, 7.0, 9.0])?; let b = Tensor::new(vec![1.0, 2.0, 3.0])?; a.sub_(&b)?; // a becomes [4.0, 5.0, 6.0] }
mul_
#![allow(unused)] fn main() { fn mul_(&mut self, rhs: &Tensor) -> Result<()> }
Performs in-place element-wise multiplication.
- Parameters:
rhs
: The tensor to multiply with the current tensor
- Returns: Result indicating success or failure
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let mut a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = Tensor::new(vec![4.0, 5.0, 6.0])?; a.mul_(&b)?; // a becomes [4.0, 10.0, 18.0] }
div_
#![allow(unused)] fn main() { fn div_(&mut self, rhs: &Tensor) -> Result<()> }
Performs in-place element-wise division.
- Parameters:
rhs
: The tensor to divide the current tensor by
- Returns: Result indicating success or failure
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let mut a = Tensor::new(vec![4.0, 10.0, 18.0])?; let b = Tensor::new(vec![4.0, 5.0, 6.0])?; a.div_(&b)?; // a becomes [1.0, 2.0, 3.0] }
Broadcasting
All binary operations in maidenx support broadcasting, allowing operations between tensors of different shapes. Broadcasting automatically expands dimensions of the smaller tensor to match the larger one where possible, following these rules:
- Trailing dimensions are aligned
- Each dimension either has the same size or one of them is 1 (which gets expanded)
- If a tensor has fewer dimensions, it is padded with dimensions of size 1 at the beginning
This allows for flexible and concise operations without unnecessary tensor reshaping or copying.
Unary Operations
Unary operations in maidenx are operations that take a single tensor as input and produce a single output tensor. These operations apply the specified mathematical function to each element of the input tensor.
Basic Unary Operations
neg
#![allow(unused)] fn main() { fn neg(&self) -> Result<Tensor> }
Negates each element in the tensor.
- Returns: A new tensor with each element negated
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, -2.0, 3.0])?; let b = a.neg()?; // [-1.0, 2.0, -3.0] }
abs
#![allow(unused)] fn main() { fn abs(&self) -> Result<Tensor> }
Computes the absolute value of each element in the tensor.
- Returns: A new tensor with absolute values
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![-1.0, 2.0, -3.0])?; let b = a.abs()?; // [1.0, 2.0, 3.0] }
sign
#![allow(unused)] fn main() { fn sign(&self) -> Result<Tensor> }
Returns the sign of each element in the tensor (-1 for negative, 0 for zero, 1 for positive).
- Returns: A new tensor with the sign of each element
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![-2.0, 0.0, 3.0])?; let b = a.sign()?; // [-1.0, 0.0, 1.0] }
square
#![allow(unused)] fn main() { fn square(&self) -> Result<Tensor> }
Squares each element in the tensor.
- Returns: A new tensor with each element squared
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = a.square()?; // [1.0, 4.0, 9.0] }
sqrt
#![allow(unused)] fn main() { fn sqrt(&self) -> Result<Tensor> }
Computes the square root of each element in the tensor.
- Returns: A new tensor with the square root of each element
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 4.0, 9.0])?; let b = a.sqrt()?; // [1.0, 2.0, 3.0] }
Activation Functions
relu
#![allow(unused)] fn main() { fn relu(&self) -> Result<Tensor> }
Applies the Rectified Linear Unit function to each element (max(0, x)).
- Returns: A new tensor with ReLU applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![-1.0, 0.0, 2.0])?; let b = a.relu()?; // [0.0, 0.0, 2.0] }
sigmoid
#![allow(unused)] fn main() { fn sigmoid(&self) -> Result<Tensor> }
Applies the sigmoid function (1 / (1 + exp(-x))) to each element.
- Returns: A new tensor with sigmoid applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![0.0])?; let b = a.sigmoid()?; // [0.5] }
tanh
#![allow(unused)] fn main() { fn tanh(&self) -> Result<Tensor> }
Applies the hyperbolic tangent function to each element.
- Returns: A new tensor with tanh applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![0.0])?; let b = a.tanh()?; // [0.0] }
gelu
#![allow(unused)] fn main() { fn gelu(&self) -> Result<Tensor> }
Applies the Gaussian Error Linear Unit function to each element.
- Returns: A new tensor with GELU applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![0.0, 1.0, -1.0])?; let b = a.gelu()?; // [0.0, 0.841..., -0.159...] }
softplus
#![allow(unused)] fn main() { fn softplus(&self) -> Result<Tensor> }
Applies the softplus function (log(1 + exp(x))) to each element.
- Returns: A new tensor with softplus applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![0.0, 1.0])?; let b = a.softplus()?; // [0.693..., 1.313...] }
Trigonometric Functions
sin
#![allow(unused)] fn main() { fn sin(&self) -> Result<Tensor> }
Computes the sine of each element.
- Returns: A new tensor with sine applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![0.0, std::f32::consts::PI/2.0])?; let b = a.sin()?; // [0.0, 1.0] }
cos
#![allow(unused)] fn main() { fn cos(&self) -> Result<Tensor> }
Computes the cosine of each element.
- Returns: A new tensor with cosine applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![0.0, std::f32::consts::PI/2.0])?; let b = a.cos()?; // [1.0, 0.0] }
tan
#![allow(unused)] fn main() { fn tan(&self) -> Result<Tensor> }
Computes the tangent of each element.
- Returns: A new tensor with tangent applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![0.0, std::f32::consts::PI/4.0])?; let b = a.tan()?; // [0.0, 1.0] }
Logarithmic and Exponential Functions
ln
#![allow(unused)] fn main() { fn ln(&self) -> Result<Tensor> }
Computes the natural logarithm of each element.
- Returns: A new tensor with natural log applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, std::f32::consts::E])?; let b = a.ln()?; // [0.0, 1.0] }
log
#![allow(unused)] fn main() { fn log(&self) -> Result<Tensor> }
Alias for ln()
- computes the natural logarithm.
- Returns: A new tensor with natural log applied
- Supports Autograd: Yes
log10
#![allow(unused)] fn main() { fn log10(&self) -> Result<Tensor> }
Computes the base-10 logarithm of each element.
- Returns: A new tensor with base-10 log applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 10.0, 100.0])?; let b = a.log10()?; // [0.0, 1.0, 2.0] }
log2
#![allow(unused)] fn main() { fn log2(&self) -> Result<Tensor> }
Computes the base-2 logarithm of each element.
- Returns: A new tensor with base-2 log applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 4.0, 8.0])?; let b = a.log2()?; // [0.0, 1.0, 2.0, 3.0] }
exp
#![allow(unused)] fn main() { fn exp(&self) -> Result<Tensor> }
Computes the exponential (e^x) of each element.
- Returns: A new tensor with exponential applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![0.0, 1.0])?; let b = a.exp()?; // [1.0, 2.718...] }
exp10
#![allow(unused)] fn main() { fn exp10(&self) -> Result<Tensor> }
Computes 10 raised to the power of each element.
- Returns: A new tensor with 10^x applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![0.0, 1.0, 2.0])?; let b = a.exp10()?; // [1.0, 10.0, 100.0] }
exp2
#![allow(unused)] fn main() { fn exp2(&self) -> Result<Tensor> }
Computes 2 raised to the power of each element.
- Returns: A new tensor with 2^x applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![0.0, 1.0, 2.0, 3.0])?; let b = a.exp2()?; // [1.0, 2.0, 4.0, 8.0] }
recip
#![allow(unused)] fn main() { fn recip(&self) -> Result<Tensor> }
Computes the reciprocal (1/x) of each element.
- Returns: A new tensor with reciprocal applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 4.0])?; let b = a.recip()?; // [1.0, 0.5, 0.25] }
Logical Operations
logical_not
#![allow(unused)] fn main() { fn logical_not(&self) -> Result<Tensor> }
Computes the logical NOT of each element.
- Returns: A new boolean tensor with values negated
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![true, false])?; let b = a.logical_not()?; // [false, true] }
Operations with Scalar Values
add_scalar
#![allow(unused)] fn main() { fn add_scalar(&self, scalar: impl Into<Scalar>) -> Result<Tensor> }
Adds a scalar value to each element in the tensor.
- Parameters:
scalar
: The scalar value to add
- Returns: A new tensor with scalar added to each element
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = a.add_scalar(5.0)?; // [6.0, 7.0, 8.0] }
sub_scalar
#![allow(unused)] fn main() { fn sub_scalar(&self, scalar: impl Into<Scalar>) -> Result<Tensor> }
Subtracts a scalar value from each element in the tensor.
- Parameters:
scalar
: The scalar value to subtract
- Returns: A new tensor with scalar subtracted from each element
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![6.0, 7.0, 8.0])?; let b = a.sub_scalar(5.0)?; // [1.0, 2.0, 3.0] }
mul_scalar
#![allow(unused)] fn main() { fn mul_scalar(&self, scalar: impl Into<Scalar>) -> Result<Tensor> }
Multiplies each element in the tensor by a scalar value.
- Parameters:
scalar
: The scalar value to multiply by
- Returns: A new tensor with each element multiplied by scalar
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = a.mul_scalar(2.0)?; // [2.0, 4.0, 6.0] }
div_scalar
#![allow(unused)] fn main() { fn div_scalar(&self, scalar: impl Into<Scalar>) -> Result<Tensor> }
Divides each element in the tensor by a scalar value.
- Parameters:
scalar
: The scalar value to divide by
- Returns: A new tensor with each element divided by scalar
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![2.0, 4.0, 6.0])?; let b = a.div_scalar(2.0)?; // [1.0, 2.0, 3.0] }
maximum_scalar
#![allow(unused)] fn main() { fn maximum_scalar(&self, scalar: impl Into<Scalar>) -> Result<Tensor> }
Takes the maximum of each element in the tensor and a scalar value.
- Parameters:
scalar
: The scalar value to compare with
- Returns: A new tensor with maximum values
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 3.0, 2.0])?; let b = a.maximum_scalar(2.0)?; // [2.0, 3.0, 2.0] }
minimum_scalar
#![allow(unused)] fn main() { fn minimum_scalar(&self, scalar: impl Into<Scalar>) -> Result<Tensor> }
Takes the minimum of each element in the tensor and a scalar value.
- Parameters:
scalar
: The scalar value to compare with
- Returns: A new tensor with minimum values
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 3.0, 2.0])?; let b = a.minimum_scalar(2.0)?; // [1.0, 2.0, 2.0] }
pow
#![allow(unused)] fn main() { fn pow(&self, exponent: impl Into<Scalar>) -> Result<Tensor> }
Raises each element in the tensor to the power of the exponent.
- Parameters:
exponent
: The exponent to raise elements to
- Returns: A new tensor with each element raised to the power
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = a.pow(2.0)?; // [1.0, 4.0, 9.0] }
leaky_relu
#![allow(unused)] fn main() { fn leaky_relu(&self, negative_slope: impl Into<Scalar>) -> Result<Tensor> }
Applies the Leaky ReLU function to each element.
- Parameters:
negative_slope
: The slope for negative input values
- Returns: A new tensor with Leaky ReLU applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![-2.0, 0.0, 3.0])?; let b = a.leaky_relu(0.1)?; // [-0.2, 0.0, 3.0] }
elu
#![allow(unused)] fn main() { fn elu(&self, alpha: impl Into<Scalar>) -> Result<Tensor> }
Applies the Exponential Linear Unit function to each element.
- Parameters:
alpha
: The alpha parameter for ELU
- Returns: A new tensor with ELU applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![-2.0, 0.0, 3.0])?; let b = a.elu(1.0)?; // [-0.865..., 0.0, 3.0] }
Comparison Operations with Scalar
eq_scalar
#![allow(unused)] fn main() { fn eq_scalar(&self, scalar: impl Into<Scalar>) -> Result<Tensor> }
Compares each element for equality with a scalar value.
- Parameters:
scalar
: The scalar value to compare with
- Returns: A new boolean tensor with comparison results
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 2.0])?; let b = a.eq_scalar(2.0)?; // [false, true, true] }
ne_scalar
#![allow(unused)] fn main() { fn ne_scalar(&self, scalar: impl Into<Scalar>) -> Result<Tensor> }
Compares each element for inequality with a scalar value.
- Parameters:
scalar
: The scalar value to compare with
- Returns: A new boolean tensor with comparison results
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 2.0])?; let b = a.ne_scalar(2.0)?; // [true, false, false] }
lt_scalar
#![allow(unused)] fn main() { fn lt_scalar(&self, scalar: impl Into<Scalar>) -> Result<Tensor> }
Checks if each element is less than a scalar value.
- Parameters:
scalar
: The scalar value to compare with
- Returns: A new boolean tensor with comparison results
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = a.lt_scalar(2.0)?; // [true, false, false] }
le_scalar
#![allow(unused)] fn main() { fn le_scalar(&self, scalar: impl Into<Scalar>) -> Result<Tensor> }
Checks if each element is less than or equal to a scalar value.
- Parameters:
scalar
: The scalar value to compare with
- Returns: A new boolean tensor with comparison results
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = a.le_scalar(2.0)?; // [true, true, false] }
gt_scalar
#![allow(unused)] fn main() { fn gt_scalar(&self, scalar: impl Into<Scalar>) -> Result<Tensor> }
Checks if each element is greater than a scalar value.
- Parameters:
scalar
: The scalar value to compare with
- Returns: A new boolean tensor with comparison results
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = a.gt_scalar(2.0)?; // [false, false, true] }
ge_scalar
#![allow(unused)] fn main() { fn ge_scalar(&self, scalar: impl Into<Scalar>) -> Result<Tensor> }
Checks if each element is greater than or equal to a scalar value.
- Parameters:
scalar
: The scalar value to compare with
- Returns: A new boolean tensor with comparison results
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = a.ge_scalar(2.0)?; // [false, true, true] }
Reduction Operations
Reduction operations in maidenx reduce tensors along specified dimensions, combining multiple elements into fewer outputs using a specific aggregation function.
Sum Operations
sum
#![allow(unused)] fn main() { fn sum(&self, dim: impl Into<Scalar>, keep_dim: bool) -> Result<Tensor> }
Computes the sum of elements along a specified dimension.
- Parameters:
dim
: The dimension to reducekeep_dim
: Whether to keep the reduced dimension as 1
- Returns: A new tensor with reduced dimension
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.sum(0, false)?; // [4.0, 6.0] let c = a.sum(1, true)?; // [[3.0], [7.0]] }
sum_all
#![allow(unused)] fn main() { fn sum_all(&self) -> Result<Tensor> }
Computes the sum of all elements in the tensor.
- Returns: A new scalar tensor containing the sum
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.sum_all()?; // [10.0] }
sum_to_shape
#![allow(unused)] fn main() { fn sum_to_shape(&self, shape: &[usize]) -> Result<Tensor> }
Reduces a tensor to the specified shape by summing along dimensions where the size differs.
- Parameters:
shape
: The target shape
- Returns: A new tensor with target shape
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0])?.reshape(&[2, 3])?; let b = a.sum_to_shape(&[1, 3])?; // [[4.0, 6.0, 8.0]] }
Mean Operations
mean
#![allow(unused)] fn main() { fn mean(&self, dim: impl Into<Scalar>, keep_dim: bool) -> Result<Tensor> }
Computes the mean of elements along a specified dimension.
- Parameters:
dim
: The dimension to reducekeep_dim
: Whether to keep the reduced dimension as 1
- Returns: A new tensor with reduced dimension
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.mean(0, false)?; // [2.0, 3.0] let c = a.mean(1, true)?; // [[1.5], [3.5]] }
mean_all
#![allow(unused)] fn main() { fn mean_all(&self) -> Result<Tensor> }
Computes the mean of all elements in the tensor.
- Returns: A new scalar tensor containing the mean
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.mean_all()?; // [2.5] }
Fold and Unfold Operations
fold
#![allow(unused)] fn main() { fn fold(&self, dim: impl Into<Scalar>, size: impl Into<Scalar>, step: impl Into<Scalar>) -> Result<Tensor> }
Folds (combines) a tensor along a dimension, reversing the unfold operation.
- Parameters:
dim
: The dimension to fold alongsize
: The size of each foldstep
: The step size between each fold
- Returns: A new tensor with the dimension folded
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { // First unfold, then fold to demonstrate the operation let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0])?.reshape(&[5])?; let unfolded = a.unfold(0, 2, 1)?; // [[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]] let folded = unfolded.fold(0, 2, 1)?; // [1.0, 2.0, 3.0, 4.0, 5.0] }
Min/Max Operations
max
#![allow(unused)] fn main() { fn max(&self, dim: impl Into<Scalar>, keep_dim: bool) -> Result<Tensor> }
Finds the maximum values along a specified dimension.
- Parameters:
dim
: The dimension to reducekeep_dim
: Whether to keep the reduced dimension as 1
- Returns: A new tensor with reduced dimension
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.max(0, false)?; // [3.0, 4.0] let c = a.max(1, true)?; // [[2.0], [4.0]] }
max_all
#![allow(unused)] fn main() { fn max_all(&self) -> Result<Tensor> }
Finds the maximum value across all elements in the tensor.
- Returns: A new scalar tensor containing the maximum
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.max_all()?; // [4.0] }
min
#![allow(unused)] fn main() { fn min(&self, dim: impl Into<Scalar>, keep_dim: bool) -> Result<Tensor> }
Finds the minimum values along a specified dimension.
- Parameters:
dim
: The dimension to reducekeep_dim
: Whether to keep the reduced dimension as 1
- Returns: A new tensor with reduced dimension
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.min(0, false)?; // [1.0, 2.0] let c = a.min(1, true)?; // [[1.0], [3.0]] }
min_all
#![allow(unused)] fn main() { fn min_all(&self) -> Result<Tensor> }
Finds the minimum value across all elements in the tensor.
- Returns: A new scalar tensor containing the minimum
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.min_all()?; // [1.0] }
Norm Operations
norm
#![allow(unused)] fn main() { fn norm(&self, p: impl Into<Scalar>, dim: impl Into<Scalar>, keep_dim: bool) -> Result<Tensor> }
Computes the p-norm of elements along a specified dimension.
- Parameters:
p
: The order of the norm (1 for L1, 2 for L2, etc.)dim
: The dimension to reducekeep_dim
: Whether to keep the reduced dimension as 1
- Returns: A new tensor with reduced dimension
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![3.0, 4.0])?.reshape(&[2])?; let b = a.norm(2.0, 0, false)?; // [5.0] (L2 norm: sqrt(3^2 + 4^2)) }
norm_all
#![allow(unused)] fn main() { fn norm_all(&self, p: impl Into<Scalar>) -> Result<Tensor> }
Computes the p-norm of all elements in the tensor.
- Parameters:
p
: The order of the norm (1 for L1, 2 for L2, etc.)
- Returns: A new scalar tensor containing the norm
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![3.0, 0.0, 4.0])?.reshape(&[3])?; let b = a.norm_all(2.0)?; // [5.0] (L2 norm: sqrt(3^2 + 0^2 + 4^2)) }
Statistical Operations
var
#![allow(unused)] fn main() { fn var(&self, dim: impl Into<Scalar>, keep_dim: bool, unbiased: bool) -> Result<Tensor> }
Computes the variance of elements along a specified dimension.
- Parameters:
dim
: The dimension to reducekeep_dim
: Whether to keep the reduced dimension as 1unbiased
: Whether to use Bessel's correction (N-1 divisor)
- Returns: A new tensor with reduced dimension
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.var(0, false, true)?; // [2.0, 2.0] }
var_all
#![allow(unused)] fn main() { fn var_all(&self, unbiased: bool) -> Result<Tensor> }
Computes the variance of all elements in the tensor.
- Parameters:
unbiased
: Whether to use Bessel's correction (N-1 divisor)
- Returns: A new scalar tensor containing the variance
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.var_all(true)?; // [1.666...] (with Bessel's correction) }
std
#![allow(unused)] fn main() { fn std(&self, dim: impl Into<Scalar>, keep_dim: bool, unbiased: bool) -> Result<Tensor> }
Computes the standard deviation of elements along a specified dimension.
- Parameters:
dim
: The dimension to reducekeep_dim
: Whether to keep the reduced dimension as 1unbiased
: Whether to use Bessel's correction (N-1 divisor)
- Returns: A new tensor with reduced dimension
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.std(0, false, true)?; // [1.414..., 1.414...] (sqrt of variance) }
std_all
#![allow(unused)] fn main() { fn std_all(&self, unbiased: bool) -> Result<Tensor> }
Computes the standard deviation of all elements in the tensor.
- Parameters:
unbiased
: Whether to use Bessel's correction (N-1 divisor)
- Returns: A new scalar tensor containing the standard deviation
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.std_all(true)?; // [1.291...] (square root of the variance) }
Transform Operations
Transform operations in maidenx modify the shape, layout, or organization of a tensor without changing its underlying values.
View Operations
View operations create a new tensor that shares storage with the original tensor but has a different shape or organization.
view
#![allow(unused)] fn main() { fn view<T: Into<Scalar> + Clone>(&self, shape: &[T]) -> Result<Tensor> }
Creates a view of the tensor with a new shape.
- Parameters:
shape
: The new shape (use -1 for one dimension to be inferred)
- Returns: A new tensor with the same data but different shape
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0])?.reshape(&[2, 3])?; let b = a.view(&[6])?; // [1.0, 2.0, 3.0, 4.0, 5.0, 6.0] let c = a.view(&[3, 2])?; // [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]] }
squeeze
#![allow(unused)] fn main() { fn squeeze(&self, dim: impl Into<Scalar>) -> Result<Tensor> }
Removes a dimension of size 1 from the tensor.
- Parameters:
dim
: The dimension to remove
- Returns: A new tensor with the specified dimension removed if it's size 1
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?.reshape(&[1, 3])?; let b = a.squeeze(0)?; // [1.0, 2.0, 3.0] }
squeeze_all
#![allow(unused)] fn main() { fn squeeze_all(&self) -> Result<Tensor> }
Removes all dimensions of size 1 from the tensor.
- Returns: A new tensor with all size 1 dimensions removed
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?.reshape(&[1, 3, 1])?; let b = a.squeeze_all()?; // [1.0, 2.0, 3.0] }
unsqueeze
#![allow(unused)] fn main() { fn unsqueeze(&self, dim: impl Into<Scalar>) -> Result<Tensor> }
Adds a dimension of size 1 at the specified position.
- Parameters:
dim
: The position to insert the new dimension
- Returns: A new tensor with an additional dimension
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?.reshape(&[3])?; let b = a.unsqueeze(0)?; // [[1.0, 2.0, 3.0]] let c = a.unsqueeze(1)?; // [[1.0], [2.0], [3.0]] }
Layout Operations
Layout operations modify how the tensor is stored in memory or accessed.
transpose
#![allow(unused)] fn main() { fn transpose(&self, dim0: impl Into<Scalar>, dim1: impl Into<Scalar>) -> Result<Tensor> }
Swaps two dimensions of a tensor.
- Parameters:
dim0
: First dimension to swapdim1
: Second dimension to swap
- Returns: A new tensor with the dimensions swapped
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0])?.reshape(&[2, 3])?; // [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]] let b = a.transpose(0, 1)?; // [[1.0, 4.0], [2.0, 5.0], [3.0, 6.0]] }
slice
#![allow(unused)] fn main() { fn slice(&self, dim: impl Into<Scalar>, start: impl Into<Scalar>, end: Option<impl Into<Scalar>>, step: impl Into<Scalar>) -> Result<Tensor> }
Creates a view that is a slice of the tensor along a dimension.
- Parameters:
dim
: The dimension to slicestart
: The starting indexend
: The ending index (exclusive, optional)step
: The step size
- Returns: A new tensor with the slice
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0])?.reshape(&[5])?; let b = a.slice(0, 1, Some(4), 1)?; // [2.0, 3.0, 4.0] let c = a.slice(0, 0, Some(5), 2)?; // [1.0, 3.0, 5.0] }
unfold
#![allow(unused)] fn main() { fn unfold(&self, dim: impl Into<Scalar>, size: impl Into<Scalar>, step: impl Into<Scalar>) -> Result<Tensor> }
Extracts sliding local blocks from a tensor along a dimension.
- Parameters:
dim
: The dimension to unfoldsize
: The size of each slicestep
: The step between each slice
- Returns: A new tensor with the dimension unfolded
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0])?.reshape(&[5])?; let b = a.unfold(0, 2, 1)?; // [[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]] }
Broadcasting Operations
Broadcasting operations expand a tensor to match a larger shape for element-wise operations.
broadcast
#![allow(unused)] fn main() { fn broadcast(&self, shape: &[usize]) -> Result<Tensor> }
Broadcasts a tensor to a new shape.
- Parameters:
shape
: The target shape
- Returns: A new tensor expanded to the target shape
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?.reshape(&[3])?; let b = a.broadcast(&[2, 3])?; // [[1.0, 2.0, 3.0], [1.0, 2.0, 3.0]] }
broadcast_like
#![allow(unused)] fn main() { fn broadcast_like(&self, other: &Tensor) -> Result<Tensor> }
Broadcasts a tensor to match the shape of another tensor.
- Parameters:
other
: The tensor to match shape with
- Returns: A new tensor with the shape of the other tensor
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?.reshape(&[3])?; let template = Tensor::new(vec![0.0, 0.0, 0.0, 0.0, 0.0, 0.0])?.reshape(&[2, 3])?; let b = a.broadcast_like(&template)?; // [[1.0, 2.0, 3.0], [1.0, 2.0, 3.0]] }
broadcast_left
#![allow(unused)] fn main() { fn broadcast_left(&self, batch_dims: &[usize]) -> Result<Tensor> }
Adds batch dimensions to the left of the tensor shape.
- Parameters:
batch_dims
: The batch dimensions to add
- Returns: A new tensor with batch dimensions added
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?.reshape(&[3])?; let b = a.broadcast_left(&[2, 2])?; // Shape: [2, 2, 3] (tensor of shape 3 repeated in a 2x2 grid) }
Reshape Operations
Reshape operations change the shape of a tensor, potentially rearranging elements.
reshape
#![allow(unused)] fn main() { fn reshape<T: Into<Scalar> + Clone>(&self, shape: &[T]) -> Result<Tensor> }
Reshapes a tensor to a new shape.
- Parameters:
shape
: The new shape (use -1 for one dimension to be inferred)
- Returns: A new tensor with the specified shape
- Supports Autograd: Yes
- Note: Unlike
view
,reshape
may copy data if the tensor is not contiguous - Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0])?.reshape(&[2, 3])?; let b = a.reshape(&[3, 2])?; // [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]] let c = a.reshape(&[-1])?; // [1.0, 2.0, 3.0, 4.0, 5.0, 6.0] }
Broadcasting Rules
When using operations that support broadcasting (like binary operations), maidenx follows these rules:
- If tensors have different number of dimensions, the shape of the tensor with fewer dimensions is padded with 1s on the left until both shapes have the same length.
- For each dimension pair, they must either be equal or one of them must be 1.
- In dimensions where one size is 1 and the other is greater, the tensor with size 1 is expanded to match the other.
For example:
- Shape [3] can broadcast with [2, 3] to produce [2, 3]
- Shape [2, 1] can broadcast with [2, 3] to produce [2, 3]
- Shape [3, 1] can broadcast with [1, 4] to produce [3, 4]
Padding Operations
Padding operations in maidenx add values around the borders of a tensor, expanding its dimensions.
Basic Padding
pad
#![allow(unused)] fn main() { fn pad(&self, paddings: &[(usize, usize)], pad_value: impl Into<Scalar>) -> Result<Tensor> }
Pads a tensor with a constant value (alias for pad_with_constant).
- Parameters:
paddings
: List of (before, after) padding pairs for each dimensionpad_value
: The value to pad with
- Returns: A new tensor with padding applied
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?.reshape(&[3])?; let b = a.pad(&[(1, 2)], 0.0)?; // [0.0, 1.0, 2.0, 3.0, 0.0, 0.0] }
Padding Modes
pad_with_constant
#![allow(unused)] fn main() { fn pad_with_constant(&self, paddings: &[(usize, usize)], pad_value: impl Into<Scalar>) -> Result<Tensor> }
Pads a tensor with a constant value.
- Parameters:
paddings
: List of (before, after) padding pairs for each dimensionpad_value
: The value to pad with
- Returns: A new tensor with constant padding
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let b = a.pad_with_constant(&[(0, 1), (1, 0)], 0.0)?; // [[0.0, 1.0, 2.0], // [0.0, 3.0, 4.0], // [0.0, 0.0, 0.0]] }
pad_with_reflection
#![allow(unused)] fn main() { fn pad_with_reflection(&self, paddings: &[(usize, usize)]) -> Result<Tensor> }
Pads a tensor by reflecting the tensor values at the boundaries.
- Parameters:
paddings
: List of (before, after) padding pairs for each dimension
- Returns: A new tensor with reflection padding
- Supports Autograd: Yes
- Note: Reflection padding requires the input dimension to be greater than 1
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0])?.reshape(&[5])?; let b = a.pad_with_reflection(&[(2, 2)])?; // [3.0, 2.0, 1.0, 2.0, 3.0, 4.0, 5.0, 4.0, 3.0] }
pad_with_replication
#![allow(unused)] fn main() { fn pad_with_replication(&self, paddings: &[(usize, usize)]) -> Result<Tensor> }
Pads a tensor by replicating the edge values.
- Parameters:
paddings
: List of (before, after) padding pairs for each dimension
- Returns: A new tensor with replication padding
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0])?.reshape(&[5])?; let b = a.pad_with_replication(&[(2, 2)])?; // [1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 5.0, 5.0] }
Multi-dimensional Padding
For multi-dimensional tensors, padding is applied to each dimension separately based on the provided padding pairs. This allows for complex padding patterns to be created.
Example: 2D Padding
#![allow(unused)] fn main() { // Create a 2D tensor let a = Tensor::new(vec![ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0 ])?.reshape(&[2, 3])?; // Pad with zeros: 1 row at top, 1 row at bottom, 2 columns on left, 1 column on right let b = a.pad_with_constant(&[(1, 1), (2, 1)], 0.0)?; // Result: // [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0], // [0.0, 0.0, 1.0, 2.0, 3.0, 0.0], // [0.0, 0.0, 4.0, 5.0, 6.0, 0.0], // [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]] }
Padding Behavior with Autograd
All padding operations support automatic differentiation (autograd). During the backward pass, gradients from the padded regions are properly handled:
- For constant padding, gradients in the padded regions are ignored
- For reflection and replication padding, gradients are properly accumulated into the original tensor
This makes padding operations safe to use in training neural networks or other gradient-based optimization tasks.
Indexing Operations
Indexing operations in maidenx allow for advanced manipulation and selection of tensor data using indices.
Basic Indexing
index
#![allow(unused)] fn main() { fn index(&self, indices: &Tensor) -> Result<Tensor> }
Selects values along dimension 0 using indices (an alias for index_select with dim=0).
- Parameters:
indices
: Tensor of indices to select
- Returns: A new tensor with selected values
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0])?.reshape(&[5])?; let indices = Tensor::new(vec![0, 2, 4])?.reshape(&[3])?; let b = a.index(&indices)?; // [1.0, 3.0, 5.0] }
Advanced Indexing
index_select
#![allow(unused)] fn main() { fn index_select(&self, dim: impl Into<Scalar>, indices: &Tensor) -> Result<Tensor> }
Selects values along a specified dimension using indices.
- Parameters:
dim
: The dimension to select fromindices
: Tensor of indices to select
- Returns: A new tensor with selected values
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0])?.reshape(&[2, 3])?; let indices = Tensor::new(vec![0, 2])?.reshape(&[2])?; let b = a.index_select(1, &indices)?; // [[1.0, 3.0], [4.0, 6.0]] }
gather
#![allow(unused)] fn main() { fn gather(&self, dim: impl Into<Scalar>, index: &Tensor) -> Result<Tensor> }
Gathers values from a tensor using an index tensor.
- Parameters:
dim
: The dimension to gather fromindex
: Tensor of indices to gather
- Returns: A new tensor with gathered values
- Supports Autograd: Yes
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0])?.reshape(&[2, 3])?; let indices = Tensor::new(vec![0, 0, 1])?.reshape(&[1, 3])?; let b = a.gather(1, &indices)?; // [[1.0, 1.0, 2.0]] }
In-place Modification Operations
index_add_
#![allow(unused)] fn main() { fn index_add_(&mut self, dim: impl Into<Scalar>, indices: &Tensor, src: &Tensor) -> Result<()> }
Adds values from the source tensor to specified indices along a dimension.
- Parameters:
dim
: The dimension to add toindices
: Tensor of indices where to addsrc
: Tensor containing values to add
- Returns: Result indicating success
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let mut a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0])?.reshape(&[5])?; let indices = Tensor::new(vec![0, 2])?.reshape(&[2])?; let src = Tensor::new(vec![10.0, 20.0])?.reshape(&[2])?; a.index_add_(0, &indices, &src)?; // a becomes [11.0, 2.0, 23.0, 4.0, 5.0] }
index_put_
#![allow(unused)] fn main() { fn index_put_(&mut self, indices: &[usize], src: &Tensor) -> Result<()> }
Puts values from the source tensor into the specified indices.
- Parameters:
indices
: List of indices where to put valuessrc
: Tensor containing values to put
- Returns: Result indicating success
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let mut a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0])?.reshape(&[5])?; let src = Tensor::new(vec![10.0, 20.0])?.reshape(&[2])?; a.index_put_(&[1], &src)?; // a becomes [1.0, 10.0, 20.0, 4.0, 5.0] }
scatter_add_
#![allow(unused)] fn main() { fn scatter_add_(&mut self, dim: impl Into<Scalar>, index: &Tensor, src: &Tensor) -> Result<()> }
Adds values from the source tensor into self at specified indices along a dimension.
- Parameters:
dim
: The dimension to scatter add toindex
: Tensor of indices where to addsrc
: Tensor containing values to add
- Returns: Result indicating success
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let mut a = Tensor::zeros(&[5])?; let indices = Tensor::new(vec![0, 2, 0])?.reshape(&[3])?; let src = Tensor::new(vec![1.0, 2.0, 3.0])?.reshape(&[3])?; a.scatter_add_(0, &indices, &src)?; // a becomes [4.0, 0.0, 2.0, 0.0, 0.0] }
Counting Operations
bincount
#![allow(unused)] fn main() { fn bincount(&self, weights: Option<&Tensor>, minlength: Option<usize>) -> Result<Tensor> }
Counts the frequency of each value in a tensor of non-negative integers.
- Parameters:
weights
: Optional tensor of weightsminlength
: Minimum length of the output tensor
- Returns: A new tensor with bin counts
- Supports Autograd: No
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![0, 1, 1, 3, 2, 1, 3])?.reshape(&[7])?; let b = a.bincount(None, None)?; // [1, 3, 1, 2] (0 appears once, 1 appears three times, etc.) // With weights let weights = Tensor::new(vec![0.5, 1.0, 1.0, 2.0, 1.5, 1.0, 2.0])?.reshape(&[7])?; let c = a.bincount(Some(&weights), None)?; // [0.5, 3.0, 1.5, 4.0] }
Performance Considerations
When using indexing operations in performance-critical code, consider these tips:
- Avoid repeated indexing: Instead of accessing individual elements in a loop, try to use vectorized operations.
- Use contiguous tensors: Indexing operations are faster on contiguous tensors.
- Batch operations: When possible, use batch operations like
index_select
rather than selecting individual elements. - Consider in-place operations: When appropriate, in-place operations like
index_add_
can be more memory-efficient.
Tensor Utilities
Device and Type Conversion
These utilities allow you to change a tensor's device or data type.
with_device / to_device
#![allow(unused)] fn main() { pub fn with_device(&mut self, device: Device) -> Result<()> pub fn to_device(&self, device: Device) -> Result<Self> }
Changes the device where a tensor is stored.
- Parameters:
device
: The target device (CPU, CUDA, MPS)
- Returns:
with_device
: Modifies the tensor in-place and returns Resultto_device
: Returns a new tensor on the specified device
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = a.to_device(Device::CPU)?; // Copy to CPU // In-place version let mut c = Tensor::new(vec![1.0, 2.0, 3.0])?; c.with_device(Device::CPU)?; // Move to CPU in-place }
with_dtype / to_dtype
#![allow(unused)] fn main() { pub fn with_dtype(&mut self, dtype: DType) -> Result<()> pub fn to_dtype(&self, dtype: DType) -> Result<Self> }
Changes the data type of a tensor.
- Parameters:
dtype
: The target data type (F32, F64, I32, etc.)
- Returns:
with_dtype
: Modifies the tensor in-place and returns Resultto_dtype
: Returns a new tensor with the specified data type
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = a.to_dtype(DType::F64)?; // Convert to 64-bit float // In-place version let mut c = Tensor::new(vec![1.0, 2.0, 3.0])?; c.with_dtype(DType::I32)?; // Convert to 32-bit int in-place }
with_shape / to_shape
#![allow(unused)] fn main() { pub fn with_shape(&mut self, shape: &[usize]) -> Result<()> pub fn to_shape(&self, shape: &[usize]) -> Result<Self> }
Changes the shape of a tensor without modifying the data.
- Parameters:
shape
: The new shape dimensions
- Returns:
with_shape
: Modifies the tensor in-place and returns Resultto_shape
: Returns a new tensor with the specified shape
- Example:
#![allow(unused)] fn main() { let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?; let b = a.to_shape(&[2, 2])?; // Reshape to 2x2 // In-place version let mut c = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?; c.with_shape(&[2, 2])?; // Reshape to 2x2 in-place }
with_grad
#![allow(unused)] fn main() { pub fn with_grad(&mut self) -> Result<()> }
Enables gradient computation for a tensor.
- Returns: Modifies the tensor in-place and returns Result
- Example:
#![allow(unused)] fn main() { let mut a = Tensor::new(vec![1.0, 2.0, 3.0])?; a.with_grad()?; // Enable gradients }
Neural Networks
The MaidenX neural networks module provides building blocks for creating deep learning models. It offers high-level abstractions for layers, optimizers, and loss functions, all of which integrate seamlessly with the tensor library's automatic differentiation system.
Overview
MaidenX neural networks module provides:
- A consistent
Layer
trait for all neural network components - Common layer implementations (Linear, Convolutional, etc.)
- Optimization algorithms (SGD, Adam)
- Loss functions (MSE, MAE, Cross Entropy, Huber)
- Support for training and evaluation modes
Training Example
Here's a simple example of training a model with MaidenX:
#![allow(unused)] fn main() { // Create model layers let mut linear1 = Linear::new(784, 128, true)?; let mut linear2 = Linear::new(128, 10, true)?; // Create optimizer let mut optimizer = Adam::new(0.001, 0.9, 0.999, 1e-8); // Training loop for epoch in 0..num_epochs { // For each batch... let input = get_batch_input()?; // Shape: [batch_size, 784] let target = get_batch_target()?; // Shape: [batch_size, 10] // Forward pass let hidden = linear1.forward(&input)?; let output = linear2.forward(&hidden)?; // Compute loss let loss_fn = MSELoss::new(); let loss = loss_fn.forward((&output, &target))?; // Backward pass loss.backward()?; // Collect parameters and update let mut params = Vec::new(); params.append(&mut linear1.parameters()); params.append(&mut linear2.parameters()); // Update parameters optimizer.step(&mut params)?; optimizer.zero_grad(&mut params)?; } }
Feature Support
MaidenX neural networks can run on different compute devices:
- CPU
- CUDA (GPU) with feature flag
cuda
- MPS (Apple Silicon) with feature flag
mps
Serialization and Deserialization
MaidenX supports model serialization and deserialization through the serde
feature flag. When enabled, all built-in layers can be saved to and loaded from files.
Enabling Serialization
To enable serialization support, add the serde
feature in your Cargo.toml:
[dependencies]
maidenx = { version = "0.1.0", features = ["serde"] }
Saving and Loading Models
Built-in layers can be saved and loaded like this:
#![allow(unused)] fn main() { // Save a linear layer let linear = Linear::new(784, 256, true)?; linear.save("path/to/model.bin", "bin")?; // Binary format linear.save("path/to/model.json", "json")?; // JSON format // Load a linear layer let loaded_linear = Linear::load("path/to/model.bin")?; }
Custom Model Serialization
For custom models or layers, you simply need to derive Serialize
and Deserialize
from the serde
crate. MaidenX will automatically provide save/load functionality for your custom models:
#![allow(unused)] fn main() { use serde::{Serialize, Deserialize}; #[derive(Layer, Clone, Serialize, Deserialize)] pub struct MyCustomModel { linear1: Linear, linear2: Linear, dropout: Dropout, state: LayerState, } impl MyCustomModel { pub fn new(input_size: usize, hidden_size: usize, output_size: usize) -> Result<Self> { Ok(Self { linear1: Linear::new(input_size, hidden_size, true)?, linear2: Linear::new(hidden_size, output_size, true)?, dropout: Dropout::new(0.5)?, state: LayerState::new(), }) } pub fn forward(&self, input: &Tensor) -> Result<Tensor> { let hidden = self.linear1.forward(input)?; let hidden_dropped = self.dropout.forward(&hidden)?; self.linear2.forward(&hidden_dropped) } pub fn parameters(&mut self) -> Vec<&mut Tensor> { let mut params = Vec::new(); params.append(&mut self.linear1.parameters()); params.append(&mut self.linear2.parameters()); params } } }
Once you've derived the required traits, you can save and load your custom models using the standard methods - no need to implement your own save/load functions:
#![allow(unused)] fn main() { // Save model let model = MyCustomModel::new(784, 256, 10)?; model.save("path/to/custom_model.bin", "bin")?; // Binary format model.save("path/to/custom_model.json", "json")?; // JSON format // Load model let loaded_model = MyCustomModel::load("path/to/custom_model.bin")?; }
Implementation Notes
- All built-in MaidenX layers derive
Serialize
andDeserialize
when theserde
feature is enabled - Only model structure and parameters are serialized, not the computational graph
- Custom models and layers must derive
Serialize
andDeserialize
manually - Binary serialization is more compact but less human-readable than JSON
- Saved models can be loaded across different platforms
Layer
The Layer
trait is the foundation of neural network components in MaidenX. It defines the interface that all neural network layers must implement.
Layer Trait Definition
#![allow(unused)] fn main() { pub trait Layer<I = &'static Tensor> { fn forward(&self, input: I) -> Result<Tensor>; fn parameters(&mut self) -> Vec<&mut Tensor>; fn is_training(&self) -> bool; fn train(&mut self); fn eval(&mut self); } }
The Layer
trait makes it easy to create custom layers and combine them into complex architectures. The generic parameter I
allows layers to handle different input types, with the default being a reference to a Tensor
.
Core Methods
forward
#![allow(unused)] fn main() { fn forward(&self, input: I) -> Result<Tensor>; }
The forward
method performs the layer's computation on the input and returns the output tensor. It's the primary function that defines the layer's behavior.
parameters
#![allow(unused)] fn main() { fn parameters(&mut self) -> Vec<&mut Tensor>; }
Returns all trainable parameters of the layer as mutable references, which can then be updated by optimizers during training.
Training State Management
is_training
#![allow(unused)] fn main() { fn is_training(&self) -> bool; }
Returns whether the layer is in training mode (true) or evaluation mode (false).
train
#![allow(unused)] fn main() { fn train(&mut self); }
Sets the layer to training mode. This affects behaviors like dropout and batch normalization.
eval
#![allow(unused)] fn main() { fn eval(&mut self); }
Sets the layer to evaluation mode. This typically disables regularization techniques like dropout.
LayerState
Most layer implementations use the LayerState
structure to track their training state:
#![allow(unused)] fn main() { pub struct LayerState { training: bool, } }
LayerState
provides a simple way to implement the training state methods:
#![allow(unused)] fn main() { impl LayerState { pub fn new() -> Self { Self { training: true } } pub fn is_training(&self) -> bool { self.training } pub fn train(&mut self) { self.training = true; } pub fn eval(&mut self) { self.training = false; } } }
Custom Layer Implementation
To implement a custom layer, you need to implement the Layer
trait:
#![allow(unused)] fn main() { #[derive(Layer, Clone)] struct MyCustomLayer { weight: Tensor, bias: Option<Tensor>, state: LayerState, } impl Layer for MyCustomLayer { fn forward(&self, input: &Tensor) -> Result<Tensor> { // Custom forward computation let output = input.matmul(&self.weight)?; if let Some(ref bias) = self.bias { Ok(output.add(bias)?) } else { Ok(output) } } fn parameters(&mut self) -> Vec<&mut Tensor> { let mut params = vec![&mut self.weight]; if let Some(ref mut bias) = self.bias { params.push(bias); } params } fn is_training(&self) -> bool { self.state.is_training() } fn train(&mut self) { self.state.train(); } fn eval(&mut self) { self.state.eval(); } } }
Using the Layer Macro
MaidenX provides a derive macro to simplify layer implementation:
#![allow(unused)] fn main() { #[derive(Layer, Clone)] struct MySimpleLayer { weight: Tensor, state: LayerState, } // The Layer trait methods for training state are automatically implemented impl MySimpleLayer { fn forward(&self, input: &Tensor) -> Result<Tensor> { // Your implementation here } fn parameters(&mut self) -> Vec<&mut Tensor> { vec![&mut self.weight] } } }
Activation Layers
Activation layers apply non-linear functions to their input, which is essential for neural networks to learn complex patterns. MaidenX provides various activation functions as standalone layers.
Available Activation Layers
ReLU (Rectified Linear Unit)
#![allow(unused)] fn main() { pub struct ReLU { state: LayerState, } }
The ReLU activation function replaces negative values with zero.
Constructor:
#![allow(unused)] fn main() { let relu = ReLU::new()?; }
Mathematical Function: f(x) = max(0, x)
Example:
#![allow(unused)] fn main() { let relu = ReLU::new()?; let x = Tensor::new(vec![-2.0, -1.0, 0.0, 1.0, 2.0])?; let y = relu.forward(&x)?; // [0.0, 0.0, 0.0, 1.0, 2.0] }
Sigmoid
#![allow(unused)] fn main() { pub struct Sigmoid { state: LayerState, } }
The Sigmoid activation squashes input values to the range (0, 1).
Constructor:
#![allow(unused)] fn main() { let sigmoid = Sigmoid::new()?; }
Mathematical Function: f(x) = 1 / (1 + e^(-x))
Example:
#![allow(unused)] fn main() { let sigmoid = Sigmoid::new()?; let x = Tensor::new(vec![-2.0, 0.0, 2.0])?; let y = sigmoid.forward(&x)?; // [0.119, 0.5, 0.881] }
Tanh (Hyperbolic Tangent)
#![allow(unused)] fn main() { pub struct Tanh { state: LayerState, } }
The Tanh activation squashes input values to the range (-1, 1).
Constructor:
#![allow(unused)] fn main() { let tanh = Tanh::new()?; }
Mathematical Function: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))
Example:
#![allow(unused)] fn main() { let tanh = Tanh::new()?; let x = Tensor::new(vec![-2.0, 0.0, 2.0])?; let y = tanh.forward(&x)?; // [-0.964, 0.0, 0.964] }
LeakyReLU
#![allow(unused)] fn main() { pub struct LeakyReLU { exponent: Scalar, state: LayerState, } }
LeakyReLU allows a small gradient for negative inputs to prevent "dying ReLU" problem.
Constructor:
#![allow(unused)] fn main() { let leaky_relu = LeakyReLU::new(0.01)?; // 0.01 is a common slope for negative values }
Mathematical Function: f(x) = max(αx, x), where α is typically a small value like 0.01
Example:
#![allow(unused)] fn main() { let leaky_relu = LeakyReLU::new(0.01)?; let x = Tensor::new(vec![-2.0, -1.0, 0.0, 1.0, 2.0])?; let y = leaky_relu.forward(&x)?; // [-0.02, -0.01, 0.0, 1.0, 2.0] }
GELU (Gaussian Error Linear Unit)
#![allow(unused)] fn main() { pub struct GELU { state: LayerState, } }
GELU activation is used in recent transformer models like BERT and GPT.
Constructor:
#![allow(unused)] fn main() { let gelu = GELU::new()?; }
Mathematical Function: f(x) = x * Φ(x), where Φ is the cumulative distribution function of the standard normal distribution
Example:
#![allow(unused)] fn main() { let gelu = GELU::new()?; let x = Tensor::new(vec![-2.0, 0.0, 2.0])?; let y = gelu.forward(&x)?; // [-0.046, 0.0, 1.954] }
ELU (Exponential Linear Unit)
#![allow(unused)] fn main() { pub struct ELU { exponent: Scalar, state: LayerState, } }
ELU uses an exponential function for negative values to allow negative outputs while maintaining smooth gradients.
Constructor:
#![allow(unused)] fn main() { let elu = ELU::new(1.0)?; // 1.0 is the alpha value }
Mathematical Function: f(x) = x if x > 0, α(e^x - 1) if x ≤ 0
Example:
#![allow(unused)] fn main() { let elu = ELU::new(1.0)?; let x = Tensor::new(vec![-2.0, -1.0, 0.0, 1.0, 2.0])?; let y = elu.forward(&x)?; // [-0.865, -0.632, 0.0, 1.0, 2.0] }
Softmax
The Softmax activation normalizes inputs into a probability distribution.
Constructor:
#![allow(unused)] fn main() { let softmax = Softmax::new(dim)?; // dim is the dimension along which to apply softmax }
Mathematical Function: f(x)_i = e^(x_i) / Σ(e^(x_j)) for all j
Example:
#![allow(unused)] fn main() { let softmax = Softmax::new(-1)?; // Apply along the last dimension let x = Tensor::new(vec![1.0, 2.0, 3.0])?; let y = softmax.forward(&x)?; // [0.09, 0.24, 0.67] }
Choosing an Activation Function
Different activation functions have different properties and are suitable for different tasks:
- ReLU: General purpose, computationally efficient, but can suffer from "dying" neurons
- LeakyReLU/ELU: Improved versions of ReLU that help with the dying neuron problem
- Sigmoid: Useful for binary classification output layers
- Tanh: Similar to sigmoid but with outputs centered around 0
- GELU: Often used in transformer models like BERT, GPT, etc.
- Softmax: Used in output layers for multi-class classification
Implementation Notes
All activation layers in MaidenX:
- Implement the
Layer
trait - Require no trainable parameters
- Support automatic differentiation for backpropagation
- Have training and evaluation modes (though they behave the same in both modes)
Convolution Layer
The Convolution layer applies a convolution operation to the input data. It's particularly effective for processing grid-structured data such as images.
Conv2d
The 2D convolution layer operates on 4D tensors with the shape [batch_size, channels, height, width].
Definition
#![allow(unused)] fn main() { pub struct Conv2d { weight: Tensor, bias: Option<Tensor>, kernel_size: (usize, usize), stride: (usize, usize), padding: (usize, usize), state: LayerState, } }
Constructor
#![allow(unused)] fn main() { pub fn new( in_channels: usize, out_channels: usize, kernel_size: (usize, usize), stride: (usize, usize), padding: (usize, usize), with_bias: bool ) -> Result<Self> }
Creates a new 2D convolution layer with the specified parameters.
Parameters:
in_channels
: Number of input channelsout_channels
: Number of output channelskernel_size
: Size of the convolving kernel as (height, width)stride
: Stride of the convolution as (height, width)padding
: Zero-padding added to both sides of the input as (height, width)with_bias
: Whether to include a bias term
Example:
#![allow(unused)] fn main() { let conv = Conv2d::new(3, 64, (3, 3), (1, 1), (1, 1), true)?; }
For more control over the initialization, you can use the extended constructor:
#![allow(unused)] fn main() { pub fn new_with_spec( in_channels: usize, out_channels: usize, kernel_size: (usize, usize), stride: (usize, usize), padding: (usize, usize), with_bias: bool, device: Device, dtype: DType ) -> Result<Self> }
Additional Parameters:
device
: The device to place the layer's parameters on (CPU, CUDA, or MPS)dtype
: The data type for the layer's parameters
Example:
#![allow(unused)] fn main() { let conv = Conv2d::new_with_spec( 3, 64, (3, 3), (1, 1), (1, 1), true, Device::CUDA(0), DType::F32 )?; }
Forward Pass
#![allow(unused)] fn main() { pub fn forward(&self, input: &Tensor) -> Result<Tensor> }
Applies the convolution operation to the input tensor.
Parameters:
input
: The input tensor with shape [batch_size, in_channels, height, width]
Returns: Output tensor with shape [batch_size, out_channels, output_height, output_width]
Example:
#![allow(unused)] fn main() { let input = Tensor::new(vec![/* values */])?.reshape(&[1, 3, 32, 32])?; let conv = Conv2d::new(3, 64, (3, 3), (1, 1), (1, 1), true)?; let output = conv.forward(&input)?; // Shape: [1, 64, 32, 32] }
Parameter Access
#![allow(unused)] fn main() { pub fn weight(&self) -> &Tensor pub fn bias(&self) -> Option<&Tensor> }
Provides access to the layer's weight and bias parameters.
Example:
#![allow(unused)] fn main() { let conv = Conv2d::new(3, 64, (3, 3), (1, 1), (1, 1), true)?; let weight = conv.weight(); // Shape: [64, 3, 3, 3] let bias = conv.bias().unwrap(); // Shape: [64] }
Layer Implementation
The Conv2d layer implements the Layer
trait, providing methods for parameter collection and training state management:
#![allow(unused)] fn main() { pub fn parameters(&mut self) -> Vec<&mut Tensor> }
Returns all trainable parameters of the layer (weight and bias if present).
Output Dimensions
For a given input dimensions, the output dimensions of the convolution are computed as:
output_height = (input_height + 2 * padding.0 - kernel_size.0) / stride.0 + 1
output_width = (input_width + 2 * padding.1 - kernel_size.1) / stride.1 + 1
Implementation Details
The MaidenX Conv2d implementation uses the im2col algorithm for efficient computation:
- The input tensor is transformed into a matrix where each column contains the values in a sliding window
- Matrix multiplication is performed between this transformed matrix and the flattened kernel weights
- The result is reshaped back to the expected output dimensions
This approach allows leveraging optimized matrix multiplication operations for convolution.
Common Configurations
Here are some common Conv2d configurations:
Basic Convolution (Same Padding)
#![allow(unused)] fn main() { // Maintains spatial dimensions let conv = Conv2d::new(in_channels, out_channels, (3, 3), (1, 1), (1, 1), true)?; }
Strided Convolution (Downsampling)
#![allow(unused)] fn main() { // Reduces spatial dimensions by half let conv = Conv2d::new(in_channels, out_channels, (3, 3), (2, 2), (1, 1), true)?; }
1x1 Convolution (Channel Mixing)
#![allow(unused)] fn main() { // Changes channel dimensions only let conv = Conv2d::new(in_channels, out_channels, (1, 1), (1, 1), (0, 0), true)?; }
Dropout Layer
The Dropout layer is a regularization technique that helps prevent neural networks from overfitting. It randomly sets a fraction of input units to zero during training, which helps prevent co-adaptation of neurons.
Definition
#![allow(unused)] fn main() { pub struct Dropout { p: f32, state: LayerState, } }
Constructor
#![allow(unused)] fn main() { pub fn new(p: f32) -> Result<Self> }
Creates a new Dropout layer with the specified dropout probability.
Parameters:
p
: Probability of an element to be zeroed (between 0 and 1)
Example:
#![allow(unused)] fn main() { let dropout = Dropout::new(0.5)?; // 50% dropout probability }
Forward Pass
#![allow(unused)] fn main() { pub fn forward(&self, input: &Tensor) -> Result<Tensor> }
Applies dropout to the input tensor.
Parameters:
input
: Input tensor of any shape
Returns: Output tensor of the same shape as input
Example:
#![allow(unused)] fn main() { // During training let mut dropout = Dropout::new(0.5)?; dropout.train(); // Activate training mode let x = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?; let y = dropout.forward(&x)?; // Some elements will be zeroed // During evaluation dropout.eval(); // Activate evaluation mode let z = dropout.forward(&x)?; // No elements will be zeroed, same as input }
Behavior Differences in Training and Evaluation
Dropout behaves differently depending on the layer's state:
-
Training Mode (
is_training() == true
):- Randomly zeroes elements of the input tensor with probability
p
- Scales the remaining elements by a factor of
1/(1-p)
to maintain the expected sum - For example, with
p=0.5
, approximately half the elements will be zeroed, and the remaining elements will be multiplied by 2
- Randomly zeroes elements of the input tensor with probability
-
Evaluation Mode (
is_training() == false
):- Identity function - returns the input unchanged
- No elements are zeroed out
Implementation Details
MaidenX's Dropout implementation includes:
- A binary mask tensor that determines which elements to keep (1) or zero out (0)
- A scaling factor of
1/(1-p)
applied to the kept elements to maintain the expected activation magnitude - Support for autograd to allow proper gradient flow during training
Tips for Using Dropout
- Dropout is typically applied after activation functions
- Common dropout rates range from 0.1 to 0.5
- Higher dropout rates provide stronger regularization but may require longer training
- Always remember to call
layer.eval()
during inference/evaluation - Dropout is often more effective in larger networks
Example Usage in a Neural Network
#![allow(unused)] fn main() { // Define a simple neural network with dropout let mut linear1 = Linear::new(784, 512, true)?; let mut dropout1 = Dropout::new(0.2)?; let mut linear2 = Linear::new(512, 10, true)?; // Training loop for _ in 0..num_epochs { // Set to training mode linear1.train(); dropout1.train(); linear2.train(); let hidden = linear1.forward(&input)?; let hidden_dropped = dropout1.forward(&hidden)?; // Apply dropout let output = linear2.forward(&hidden_dropped)?; // Compute loss and update parameters // ... } // Evaluation linear1.eval(); dropout1.eval(); // Important: disable dropout during evaluation linear2.eval(); let hidden = linear1.forward(&test_input)?; let hidden_dropped = dropout1.forward(&hidden)?; // No dropout is applied let predictions = linear2.forward(&hidden_dropped)?; }
Embedding Layer
The Embedding layer converts integer indices into dense vector representations. It's commonly used as the first layer in models that process text, categorical data, or any discrete input that needs to be mapped to a continuous vector space.
Definition
#![allow(unused)] fn main() { pub struct Embedding { weight: Tensor, padding_idx: Option<usize>, max_norm: Option<f32>, norm_type: f32, scale_grad_by_freq: bool, state: LayerState, } }
Constructor
#![allow(unused)] fn main() { pub fn new(num_embeddings: usize, embedding_dim: usize) -> Result<Self> }
Creates a new Embedding layer with the specified dimensions.
Parameters:
num_embeddings
: Size of the vocabulary (number of possible indices)embedding_dim
: Size of each embedding vector
Example:
#![allow(unused)] fn main() { let embedding = Embedding::new(10000, 300)?; // 10,000 words, 300-dimensional embeddings }
For more control over the initialization and additional features, you can use the extended constructor:
#![allow(unused)] fn main() { pub fn new_with_spec( num_embeddings: usize, embedding_dim: usize, padding_idx: Option<usize>, max_norm: Option<f32>, norm_type: f32, scale_grad_by_freq: bool, device: Device, dtype: DType ) -> Result<Self> }
Additional Parameters:
padding_idx
: If specified, entries at this index will be filled with zerosmax_norm
: If specified, embeddings will be normalized to have at most this normnorm_type
: The p-norm to use for normalization (default: 2.0)scale_grad_by_freq
: If true, gradients are scaled by the inverse of frequency of the wordsdevice
: The device to place the layer's parameters on (CPU, CUDA, or MPS)dtype
: The data type for the layer's parameters
Example:
#![allow(unused)] fn main() { let embedding = Embedding::new_with_spec( 10000, 300, Some(0), // Index 0 is padding Some(5.0), // Maximum norm of 5.0 2.0, // L2 norm true, // Scale gradients by frequency Device::CUDA(0), DType::F32 )?; }
Forward Pass
#![allow(unused)] fn main() { pub fn forward(&self, input: &Tensor) -> Result<Tensor> }
Retrieves embeddings for the given indices.
Parameters:
input
: Tensor of integer indices with any shape, dtype must be an integer type
Returns: Tensor of embeddings with shape [*input.shape, embedding_dim]
Example:
#![allow(unused)] fn main() { let embedding = Embedding::new(10, 5)?; let indices = Tensor::new(vec![1, 3, 5, 7])?; let embeddings = embedding.forward(&indices)?; // Shape: [4, 5] // With batch dimension let batch_indices = Tensor::new(vec![1, 3, 5, 7, 2, 4, 6, 8])?.reshape(&[2, 4])?; let batch_embeddings = embedding.forward(&batch_indices)?; // Shape: [2, 4, 5] }
Parameter Access
#![allow(unused)] fn main() { pub fn weight(&self) -> &Tensor }
Provides access to the embedding matrix.
Example:
#![allow(unused)] fn main() { let embedding = Embedding::new(10, 5)?; let weights = embedding.weight(); // Shape: [10, 5] }
Other accessor methods include:
num_embeddings()
embedding_dim()
padding_idx()
max_norm()
norm_type()
scale_grad_by_freq()
Layer Implementation
The Embedding layer implements the Layer
trait, providing methods for parameter collection and training state management:
#![allow(unused)] fn main() { pub fn parameters(&mut self) -> Vec<&mut Tensor> }
Returns the embedding matrix as a trainable parameter.
Common Use Cases
Word Embeddings
#![allow(unused)] fn main() { // 10,000 words in vocabulary, 300-dim embeddings let word_embedding = Embedding::new(10000, 300)?; // Convert word indices to embeddings let word_indices = Tensor::new(vec![42, 1337, 7, 42])?; let word_vectors = word_embedding.forward(&word_indices)?; // Shape: [4, 300] }
One-Hot to Dense
#![allow(unused)] fn main() { // For 10 categorical values let category_embedding = Embedding::new(10, 5)?; // Convert category indices to embeddings let categories = Tensor::new(vec![0, 3, 9, 2])?; let category_vectors = category_embedding.forward(&categories)?; // Shape: [4, 5] }
With Padding
#![allow(unused)] fn main() { // Use index 0 as padding let embedding = Embedding::new_with_spec(1000, 64, Some(0), None, 2.0, false, Device::CPU, DType::F32)?; // Input with padding let input = Tensor::new(vec![0, 1, 2, 0, 3, 0])?; let embeddings = embedding.forward(&input)?; // Embeddings for index 0 will be all zeros }
With Maximum Norm
#![allow(unused)] fn main() { // Limit embedding norms to 1.0 let embedding = Embedding::new_with_spec(1000, 64, None, Some(1.0), 2.0, false, Device::CPU, DType::F32)?; // All retrieved embeddings will have norm <= 1.0 let input = Tensor::new(vec![5, 10, 15])?; let embeddings = embedding.forward(&input)?; }
Implementation Notes
- The embedding matrix is initialized with random values scaled by
1/sqrt(embedding_dim)
- If
padding_idx
is specified, the corresponding embedding is filled with zeros - If
max_norm
is specified, embeddings are normalized to have at most this norm - If
scale_grad_by_freq
is true, gradients are scaled by the inverse frequency of the words in the mini-batch
Linear Layer
The Linear layer (also known as a fully connected or dense layer) performs a linear transformation on the input data. It's one of the most fundamental building blocks in neural networks.
Definition
#![allow(unused)] fn main() { pub struct Linear { weight: Tensor, bias: Option<Tensor>, state: LayerState, } }
Constructor
#![allow(unused)] fn main() { pub fn new(in_features: usize, out_features: usize, with_bias: bool) -> Result<Self> }
Creates a new Linear layer with the specified dimensions.
Parameters:
in_features
: The size of each input sampleout_features
: The size of each output samplewith_bias
: Whether to include a bias term
Example:
#![allow(unused)] fn main() { let linear = Linear::new(784, 256, true)?; }
For more control over the initialization, you can use the extended constructor:
#![allow(unused)] fn main() { pub fn new_with_spec( in_features: usize, out_features: usize, with_bias: bool, device: Device, dtype: DType ) -> Result<Self> }
Additional Parameters:
device
: The device to place the layer's parameters on (CPU, CUDA, or MPS)dtype
: The data type for the layer's parameters
Example:
#![allow(unused)] fn main() { let linear = Linear::new_with_spec( 784, 256, true, Device::CUDA(0), DType::F32 )?; }
Forward Pass
#![allow(unused)] fn main() { pub fn forward(&self, input: &Tensor) -> Result<Tensor> }
Applies the linear transformation y = xW + b.
Parameters:
input
: The input tensor with shape [batch_size, ..., in_features]
Returns: Output tensor with shape [batch_size, ..., out_features]
Example:
#![allow(unused)] fn main() { let input = Tensor::new(vec![1.0, 2.0, 3.0, 4.0])?.reshape(&[2, 2])?; let linear = Linear::new(2, 3, true)?; let output = linear.forward(&input)?; // Shape: [2, 3] }
Parameter Access
#![allow(unused)] fn main() { pub fn weight(&self) -> &Tensor pub fn bias(&self) -> Option<&Tensor> }
Provides access to the layer's weight and bias parameters.
Example:
#![allow(unused)] fn main() { let linear = Linear::new(2, 3, true)?; let weight = linear.weight(); // Shape: [3, 2] let bias = linear.bias().unwrap(); // Shape: [3] }
Layer Implementation
The Linear layer implements the Layer
trait, providing methods for parameter collection and training state management:
#![allow(unused)] fn main() { pub fn parameters(&mut self) -> Vec<&mut Tensor> }
Returns all trainable parameters of the layer (weight and bias if present).
Mathematical Operation
For an input tensor x of shape [batch_size, in_features], the Linear layer computes:
output = x @ weight.T + bias
Where:
- @ represents the matrix multiplication
- weight.T is the transposed weight matrix of shape [out_features, in_features]
- bias is the bias vector of shape [out_features]
The output tensor has shape [batch_size, out_features].
Broadcasting Support
The Linear layer supports broadcasting for batched inputs. If the input tensor has additional leading dimensions, they are preserved in the output:
#![allow(unused)] fn main() { let input = Tensor::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0])?.reshape(&[3, 2])?; let linear = Linear::new(2, 4, true)?; let output = linear.forward(&input)?; // Shape: [3, 4] }
For a more complex batch structure:
#![allow(unused)] fn main() { // Input shape: [batch_size, sequence_length, in_features] let input = Tensor::new(vec![/* values */])?.reshape(&[32, 10, 64])?; let linear = Linear::new(64, 128, true)?; let output = linear.forward(&input)?; // Shape: [32, 10, 128] }
Normalization Layer
Normalization layers help stabilize and accelerate neural network training by standardizing the inputs to each layer. MaidenX provides layer normalization, which normalizes inputs across feature dimensions.
LayerNorm
Layer normalization normalizes the activations of a single sample, typically across the feature dimension(s). Unlike batch normalization, layer normalization operates on a single example, making it well-suited for scenarios with variable batch sizes or recurrent neural networks.
Definition
#![allow(unused)] fn main() { pub struct LayerNorm { weight: Tensor, bias: Option<Tensor>, normalized_shape: Vec<usize>, eps: f32, state: LayerState, } }
Constructor
#![allow(unused)] fn main() { pub fn new(normalized_shape: Vec<usize>, with_bias: bool, eps: f32) -> Result<Self> }
Creates a new layer normalization module.
Parameters:
normalized_shape
: The shape of the normalized dimensions (usually the feature dimensions)with_bias
: Whether to include a bias termeps
: Small constant added to the denominator for numerical stability
Example:
#![allow(unused)] fn main() { // For normalizing a feature vector of size 256 let layer_norm = LayerNorm::new(vec![256], true, 1e-5)?; // For normalizing a 2D feature map of size [32, 64] let layer_norm_2d = LayerNorm::new(vec![32, 64], true, 1e-5)?; }
For more control over the initialization, you can use the extended constructor:
#![allow(unused)] fn main() { pub fn new_with_spec( normalized_shape: Vec<usize>, with_bias: bool, eps: f32, device: Device, dtype: DType ) -> Result<Self> }
Additional Parameters:
device
: The device to place the layer's parameters on (CPU, CUDA, or MPS)dtype
: The data type for the layer's parameters
Example:
#![allow(unused)] fn main() { let layer_norm = LayerNorm::new_with_spec( vec![512], true, 1e-5, Device::CUDA(0), DType::F32 )?; }
Forward Pass
#![allow(unused)] fn main() { pub fn forward(&self, input: &Tensor) -> Result<Tensor> }
Applies layer normalization to the input tensor.
Parameters:
input
: Input tensor with shape [batch_size, ..., *normalized_shape]
Returns: Output tensor with the same shape as input
Example:
#![allow(unused)] fn main() { let layer_norm = LayerNorm::new(vec![5], true, 1e-5)?; // Input tensor with shape [2, 5] let input = Tensor::new(vec![ vec![1.0, 2.0, 3.0, 4.0, 5.0], vec![5.0, 4.0, 3.0, 2.0, 1.0] ])?; let output = layer_norm.forward(&input)?; // Shape: [2, 5], normalized across dimension 1 }
Parameter Access
#![allow(unused)] fn main() { pub fn weight(&self) -> &Tensor pub fn bias(&self) -> Option<&Tensor> }
Provides access to the layer's weight and bias parameters.
Example:
#![allow(unused)] fn main() { let layer_norm = LayerNorm::new(vec![10], true, 1e-5)?; let weight = layer_norm.weight(); // Shape: [10] let bias = layer_norm.bias().unwrap(); // Shape: [10] }
Other accessor methods include:
normalized_shape()
eps()
Layer Implementation
The LayerNorm layer implements the Layer
trait, providing methods for parameter collection and training state management:
#![allow(unused)] fn main() { pub fn parameters(&mut self) -> Vec<&mut Tensor> }
Returns all trainable parameters of the layer (weight and bias if present).
Mathematical Operation
For an input tensor x of shape [batch_size, ..., normalized_dims], LayerNorm computes:
y = (x - E[x]) / sqrt(Var[x] + eps) * weight + bias
Where:
- E[x] is the mean across the normalized dimensions
- Var[x] is the variance across the normalized dimensions
- weight and bias are learnable parameters with the shape of normalized_dims
Common Use Cases
Normalizing Features in MLP
#![allow(unused)] fn main() { let mut linear1 = Linear::new(784, 256, true)?; let mut layer_norm = LayerNorm::new(vec![256], true, 1e-5)?; let mut linear2 = Linear::new(256, 10, true)?; // Forward pass let x1 = linear1.forward(&input)?; let x2 = layer_norm.forward(&x1)?; // Apply normalization let x3 = x2.relu()?; // Activation after normalization let output = linear2.forward(&x3)?; }
Normalizing Features in Transformer
#![allow(unused)] fn main() { // With attention output of shape [batch_size, seq_len, hidden_size] let attention_output = /* ... */; // Normalize over the hidden dimension let layer_norm = LayerNorm::new(vec![hidden_size], true, 1e-5)?; let normalized_output = layer_norm.forward(&attention_output)?; }
Multi-Dimensional Normalization
#![allow(unused)] fn main() { // For a 2D feature map of shape [batch_size, channels, height, width] let feature_map = /* ... */; // Reshape to move spatial dimensions to the batch dimension let reshaped = feature_map.reshape(&[batch_size * channels, height * width])?; // Normalize over the flattened spatial dimensions let layer_norm = LayerNorm::new(vec![height * width], true, 1e-5)?; let normalized = layer_norm.forward(&reshaped)?; // Reshape back to original shape let output = normalized.reshape(&[batch_size, channels, height, width])?; }
Implementation Notes
- Unlike batch normalization, layer normalization operates independently on each sample
- The weight parameter is initialized to ones and the bias to zeros
- The normalization statistics (mean and variance) are computed at runtime, not stored
- LayerNorm behaves the same during training and evaluation (no separate statistics)
- The normalized_shape parameter specifies the dimensions over which normalization is applied
Optimizer
The Optimizer
trait defines the interface for all optimization algorithms in MaidenX. Optimizers update the parameters of neural network layers based on gradient information to minimize the loss function.
Optimizer Trait Definition
#![allow(unused)] fn main() { pub trait Optimizer { fn step(&mut self, parameters: &mut [&mut Tensor]) -> Result<()>; fn zero_grad(&mut self, parameters: &mut [&mut Tensor]) -> Result<()>; fn set_learning_rate(&mut self, learning_rate: impl Into<Scalar>); } }
Core Methods
step
#![allow(unused)] fn main() { fn step(&mut self, parameters: &mut [&mut Tensor]) -> Result<()>; }
The step
method updates the parameters based on their current gradients. This is the core method that performs the optimization algorithm's update rule.
zero_grad
#![allow(unused)] fn main() { fn zero_grad(&mut self, parameters: &mut [&mut Tensor]) -> Result<()>; }
Resets the gradients of all parameters to zero, typically called before computing gradients for the next batch.
set_learning_rate
#![allow(unused)] fn main() { fn set_learning_rate(&mut self, learning_rate: impl Into<Scalar>); }
Allows dynamic adjustment of the learning rate during training, which can be useful for learning rate scheduling.
Available Optimizers
SGD (Stochastic Gradient Descent)
The SGD optimizer implements basic gradient descent with a configurable learning rate:
#![allow(unused)] fn main() { pub struct SGD { learning_rate: Scalar, } impl SGD { pub fn new(learning_rate: impl Into<Scalar>) -> Self { Self { learning_rate: learning_rate.into(), } } } }
Usage Example:
#![allow(unused)] fn main() { let mut sgd = SGD::new(0.01); // Training loop for _ in 0..num_epochs { // Forward and backward pass // ... // Update parameters sgd.step(&mut parameters)?; sgd.zero_grad(&mut parameters)?; } }
Adam (Adaptive Moment Estimation)
The Adam optimizer implements adaptive learning rates for each parameter with momentum and RMSProp-like behavior:
#![allow(unused)] fn main() { pub struct Adam { learning_rate: Scalar, beta1: Scalar, // Exponential decay rate for first moment beta2: Scalar, // Exponential decay rate for second moment epsilon: Scalar, // Small constant for numerical stability t: usize, // Timestep m: Vec<Tensor>, // First moment vectors v: Vec<Tensor>, // Second moment vectors } impl Adam { pub fn new( learning_rate: impl Into<Scalar>, beta1: impl Into<Scalar>, beta2: impl Into<Scalar>, epsilon: impl Into<Scalar> ) -> Self { // Initialization } } }
Usage Example:
#![allow(unused)] fn main() { let mut adam = Adam::new(0.001, 0.9, 0.999, 1e-8); // Training loop for _ in 0..num_epochs { // Forward and backward pass // ... // Update parameters adam.step(&mut parameters)?; adam.zero_grad(&mut parameters)?; } }
Implementing Custom Optimizers
To create a custom optimizer, implement the Optimizer
trait:
#![allow(unused)] fn main() { #[derive(Optimizer)] struct MyCustomOptimizer { learning_rate: Scalar, momentum: Scalar, velocity: Vec<Tensor>, } impl MyCustomOptimizer { pub fn new(learning_rate: impl Into<Scalar>, momentum: impl Into<Scalar>) -> Self { Self { learning_rate: learning_rate.into(), momentum: momentum.into(), velocity: Vec::new(), } } pub fn step(&mut self, parameters: &mut [&mut Tensor]) -> Result<()> { // Initialize velocity vectors if needed if self.velocity.is_empty() { self.velocity = parameters .iter() .map(|param| Tensor::zeros_like(param)) .collect::<Result<Vec<_>>>()?; } // Update rule with momentum for (param_idx, param) in parameters.iter_mut().enumerate() { if let Some(grad) = param.grad()? { // Update velocity self.velocity[param_idx] = self.velocity[param_idx] .mul_scalar(self.momentum)? .add(&grad)?; // Update parameter param.sub_(&self.velocity[param_idx].mul_scalar(self.learning_rate)?)?; } } Ok(()) } pub fn zero_grad(&mut self, parameters: &mut [&mut Tensor]) -> Result<()> { for param in parameters.iter_mut() { param.zero_grad()?; } Ok(()) } pub fn set_learning_rate(&mut self, learning_rate: impl Into<Scalar>) { self.learning_rate = learning_rate.into(); } } }
Learning Rate Scheduling
You can implement learning rate scheduling by adjusting the learning rate during training:
#![allow(unused)] fn main() { let mut optimizer = SGD::new(0.1); for epoch in 0..num_epochs { // Decay learning rate every 10 epochs if epoch > 0 && epoch % 10 == 0 { let current_lr = optimizer.learning_rate.to_f32(); optimizer.set_learning_rate(current_lr * 0.1); } // Training loop // ... } }
Device
MaidenX supports multiple computing devices to run tensor operations, allowing you to choose the most suitable hardware for your specific use case. This flexibility lets you develop on one platform and deploy on another without changing your code.
Supported Devices
Device | Description | Availability |
---|---|---|
CPU | Standard CPU execution | Always available |
CUDA | NVIDIA GPU acceleration via CUDA | Available with cuda feature flag |
MPS | Apple Silicon GPU acceleration via Metal Performance Shaders | Available with mps feature flag |
Vulkan | Cross-platform GPU acceleration | Planned for future release |
Device Selection
You can set the default device for tensor operations using:
#![allow(unused)] fn main() { use maidenx::prelude::*; // Set default device to CPU set_default_device(Device::CPU); // Set default device to first CUDA GPU #[cfg(feature = "cuda")] set_default_device(Device::CUDA(0)); // Set default device to Apple Silicon GPU #[cfg(feature = "mps")] set_default_device(Device::MPS); }
Per-Tensor Device Placement
You can also create tensors on specific devices, regardless of the default:
#![allow(unused)] fn main() { // Create a tensor on CPU let cpu_tensor = Tensor::new_with_spec( vec![1.0, 2.0, 3.0], Device::CPU, DType::F32 )?; // Create a tensor on CUDA (if available) #[cfg(feature = "cuda")] let cuda_tensor = Tensor::new_with_spec( vec![1.0, 2.0, 3.0], Device::CUDA(0), DType::F32 )?; }
Moving Tensors Between Devices
Tensors can be moved between devices using the to_device
method:
#![allow(unused)] fn main() { // Move tensor to CPU let tensor_on_cpu = tensor.to_device(Device::CPU)?; // Move tensor to CUDA (if available) #[cfg(feature = "cuda")] let tensor_on_cuda = tensor.to_device(Device::CUDA(0))?; }
Device-Specific Considerations
CPU
- Available on all platforms
- Good for development and debugging
- Slower for large-scale computations
- No special requirements
CUDA
- Requires NVIDIA GPU and CUDA toolkit
- Best performance for large models and batch sizes
- Enabled with the
cuda
feature flag - Supports multiple GPU selection via
Device::CUDA(device_id)
MPS (Metal Performance Shaders)
- Available on Apple Silicon (M1/M2/M3) devices
- Good performance on Apple hardware
- Enabled with the
mps
feature flag - Does not support 64-bit data types (F64, I64, U64)
Vulkan (Planned)
- Will provide cross-platform GPU acceleration
- Intended to work on various GPUs (NVIDIA, AMD, Intel)
- Not yet implemented
Example: Multi-Device Code
Here's how to write code that can run on any available device:
use maidenx::prelude::*; fn main() -> Result<()> { // Choose the best available device auto_set_device(); println!("Using device: {}", get_default_device().name()); // Create a tensor (will use the default device) let a = Tensor::new(vec![1.0, 2.0, 3.0])?; let b = Tensor::new(vec![4.0, 5.0, 6.0])?; // Operations run on the tensor's device let c = a.add(&b)?; println!("Result: {}", c); Ok(()) }
This code automatically selects the best available device based on feature flags, with CUDA preferred over MPS, and MPS preferred over CPU.
Data Types (DType)
MaidenX supports a variety of data types for tensors, allowing you to optimize for memory usage, precision, and performance. The appropriate data type choice can significantly impact your model's accuracy and execution speed.
Supported Data Types
Category | Data Type | MaidenX Identifier | Size (bits) | Device Support | Use Cases |
---|---|---|---|---|---|
Floating Point | BFloat16 | maidenx::bfloat16 | 16 | All | Training with reduced precision |
Float16 | maidenx::float16 | 16 | All | Memory-efficient inference | |
Float32 | maidenx::float32 | 32 | All | General training and inference | |
Float64 | maidenx::float64 | 64 | CPU, CUDA | High-precision scientific computing | |
Integer (Unsigned) | UInt8 | maidenx::uint8 | 8 | All | Quantized models, image processing |
UInt16 | maidenx::uint16 | 16 | All | Compact indexing | |
UInt32 | maidenx::uint32 | 32 | All | Large indices | |
UInt64 | maidenx::uint64 | 64 | CPU, CUDA | Very large indices | |
Integer (Signed) | Int8 | maidenx::int8 | 8 | All | Quantized models, efficient storage |
Int16 | maidenx::int16 | 16 | All | Compact representation with sign | |
Int32 | maidenx::int32 | 32 | All | General integer operations | |
Int64 | maidenx::int64 | 64 | CPU, CUDA | Large integer ranges | |
Boolean | Bool | maidenx::bool | 1 | All | Masks, condition operations |
Setting the Default Data Type
You can set the default data type for all tensor operations:
#![allow(unused)] fn main() { use maidenx::prelude::*; // Set default dtype to Float32 set_default_dtype(DType::F32); // Create tensor with default dtype let tensor = Tensor::new(vec![1.0, 2.0, 3.0])?; }
Explicit Data Type Specification
You can create tensors with specific data types regardless of the default:
#![allow(unused)] fn main() { // Create a tensor with float64 precision let high_precision = Tensor::new_with_spec( vec![1.0, 2.0, 3.0], Device::CPU, DType::F64 )?; // Create an integer tensor let int_tensor = Tensor::new_with_spec( vec![1, 2, 3], Device::CPU, DType::I32 )?; }
Type Conversion
Tensors can be converted between data types using to_dtype
:
#![allow(unused)] fn main() { // Convert float32 to float64 let f32_tensor = Tensor::new(vec![1.0f32, 2.0, 3.0])?; let f64_tensor = f32_tensor.to_dtype(DType::F64)?; // Convert float to int let int_tensor = f32_tensor.to_dtype(DType::I32)?; }
Boolean Type Handling
Boolean tensors are handled specially depending on the context:
-
Logical Operations: Remain as
maidenx::bool
#![allow(unused)] fn main() { let a = Tensor::new(vec![true, false, true])?; let b = Tensor::new(vec![false, true, false])?; let logical_and = a.logical_and(&b)?; // Still boolean type }
-
Arithmetic Operations: Promoted to
maidenx::uint8
#![allow(unused)] fn main() { let bool_tensor = Tensor::new(vec![true, false, true])?; let added = bool_tensor.add_scalar(1)?; // Converted to uint8 for addition }
-
Operations with Floating-Point: Promoted to
maidenx::float32
#![allow(unused)] fn main() { let bool_tensor = Tensor::new(vec![true, false])?; let float_tensor = Tensor::new(vec![1.5, 2.5])?; let result = bool_tensor.mul(&float_tensor)?; // Converted to float32 }
Automatic Differentiation and Data Types
Only floating-point data types support automatic differentiation (autograd):
#![allow(unused)] fn main() { // This works - float types support gradients let mut float_tensor = Tensor::new(vec![1.0, 2.0, 3.0])?; float_tensor.with_grad()?; // Enables gradient tracking // This would fail - integer types don't support gradients let mut int_tensor = Tensor::new(vec![1, 2, 3])?.to_dtype(DType::I32)?; // int_tensor.with_grad()?; // Would return an error }
Device Compatibility
Not all data types are supported on all devices:
- 64-bit types (
F64
,I64
,U64
) are not supported on MPS (Apple Silicon) - When using MPS, use 32-bit or smaller types
Memory and Performance Considerations
- Float16/BFloat16: Half-precision can significantly reduce memory usage with minimal accuracy loss for many applications
- Int8/UInt8: Quantized models often use 8-bit integers for dramatic memory and performance improvements
- Float64: Double precision is rarely needed for machine learning but may be crucial for scientific computing
- Bool: Most memory-efficient for masks and conditions, but may be promoted to larger types during operations
Example: Mixed Precision Training
use maidenx::prelude::*; fn main() -> Result<()> { // Use float16 for weights to save memory let mut weights = Tensor::randn(&[1024, 1024])?.to_dtype(DType::F16)?; weights.with_grad()?; // Use float32 for activation and loss computation for stability let input = Tensor::randn(&[32, 1024])?; // Forward pass (operations automatically convert as needed) let output = input.matmul(&weights)?; // Loss computation in float32 for accuracy let target = Tensor::randn(&[32, 1024])?; let loss = output.sub(&target)?.pow(2.0)?.mean_all()?; // Backward pass (gradients handled in appropriate precision) loss.backward()?; Ok(()) }
This approach balances memory efficiency with numerical stability.