Convolution Layer
The Convolution layer applies a convolution operation to the input data. It's particularly effective for processing grid-structured data such as images.
Conv2d
The 2D convolution layer operates on 4D tensors with the shape [batch_size, channels, height, width].
Definition
#![allow(unused)] fn main() { pub struct Conv2d { weight: Tensor, bias: Option<Tensor>, kernel_size: (usize, usize), stride: (usize, usize), padding: (usize, usize), state: LayerState, } }
Constructor
#![allow(unused)] fn main() { pub fn new( in_channels: usize, out_channels: usize, kernel_size: (usize, usize), stride: (usize, usize), padding: (usize, usize), with_bias: bool ) -> Result<Self> }
Creates a new 2D convolution layer with the specified parameters.
Parameters:
in_channels
: Number of input channelsout_channels
: Number of output channelskernel_size
: Size of the convolving kernel as (height, width)stride
: Stride of the convolution as (height, width)padding
: Zero-padding added to both sides of the input as (height, width)with_bias
: Whether to include a bias term
Example:
#![allow(unused)] fn main() { let conv = Conv2d::new(3, 64, (3, 3), (1, 1), (1, 1), true)?; }
For more control over the initialization, you can use the extended constructor:
#![allow(unused)] fn main() { pub fn new_with_spec( in_channels: usize, out_channels: usize, kernel_size: (usize, usize), stride: (usize, usize), padding: (usize, usize), with_bias: bool, device: Device, dtype: DType ) -> Result<Self> }
Additional Parameters:
device
: The device to place the layer's parameters on (CPU, CUDA, or MPS)dtype
: The data type for the layer's parameters
Example:
#![allow(unused)] fn main() { let conv = Conv2d::new_with_spec( 3, 64, (3, 3), (1, 1), (1, 1), true, Device::CUDA(0), DType::F32 )?; }
Forward Pass
#![allow(unused)] fn main() { pub fn forward(&self, input: &Tensor) -> Result<Tensor> }
Applies the convolution operation to the input tensor.
Parameters:
input
: The input tensor with shape [batch_size, in_channels, height, width]
Returns: Output tensor with shape [batch_size, out_channels, output_height, output_width]
Example:
#![allow(unused)] fn main() { let input = Tensor::new(vec![/* values */])?.reshape(&[1, 3, 32, 32])?; let conv = Conv2d::new(3, 64, (3, 3), (1, 1), (1, 1), true)?; let output = conv.forward(&input)?; // Shape: [1, 64, 32, 32] }
Parameter Access
#![allow(unused)] fn main() { pub fn weight(&self) -> &Tensor pub fn bias(&self) -> Option<&Tensor> }
Provides access to the layer's weight and bias parameters.
Example:
#![allow(unused)] fn main() { let conv = Conv2d::new(3, 64, (3, 3), (1, 1), (1, 1), true)?; let weight = conv.weight(); // Shape: [64, 3, 3, 3] let bias = conv.bias().unwrap(); // Shape: [64] }
Layer Implementation
The Conv2d layer implements the Layer
trait, providing methods for parameter collection and training state management:
#![allow(unused)] fn main() { pub fn parameters(&mut self) -> Vec<&mut Tensor> }
Returns all trainable parameters of the layer (weight and bias if present).
Output Dimensions
For a given input dimensions, the output dimensions of the convolution are computed as:
output_height = (input_height + 2 * padding.0 - kernel_size.0) / stride.0 + 1
output_width = (input_width + 2 * padding.1 - kernel_size.1) / stride.1 + 1
Implementation Details
The MaidenX Conv2d implementation uses the im2col algorithm for efficient computation:
- The input tensor is transformed into a matrix where each column contains the values in a sliding window
- Matrix multiplication is performed between this transformed matrix and the flattened kernel weights
- The result is reshaped back to the expected output dimensions
This approach allows leveraging optimized matrix multiplication operations for convolution.
Common Configurations
Here are some common Conv2d configurations:
Basic Convolution (Same Padding)
#![allow(unused)] fn main() { // Maintains spatial dimensions let conv = Conv2d::new(in_channels, out_channels, (3, 3), (1, 1), (1, 1), true)?; }
Strided Convolution (Downsampling)
#![allow(unused)] fn main() { // Reduces spatial dimensions by half let conv = Conv2d::new(in_channels, out_channels, (3, 3), (2, 2), (1, 1), true)?; }
1x1 Convolution (Channel Mixing)
#![allow(unused)] fn main() { // Changes channel dimensions only let conv = Conv2d::new(in_channels, out_channels, (1, 1), (1, 1), (0, 0), true)?; }