Context
Developed an autoencoder neural network for image compression on the Fashion-MNIST dataset. The project focused on optimizing the architecture to maximize Structural Similarity Index (SSIM) while minimizing Mean Squared Error (MSE).
Technologies Used
- Framework: PyTorch
- Dataset: Fashion-MNIST (60,000 training images)
- Libraries: NumPy, Matplotlib, Scikit-learn
- Metrics: SSIM, MSE, compression ratio
Implementation
Architecture:
- Encoder: Convolutional layers for feature extraction
- Latent space: Compact bottleneck representation
- Decoder: Transposed convolutions for image reconstruction
Training:
- Loss function: Combination of MSE and SSIM
- Optimizer: Adam
- Regularization: Dropout and BatchNorm
The autoencoder learns to compress 28×28 grayscale images into a compact latent representation and reconstruct them with minimal information loss.
Results
- Achieved excellent compression ratio
- High SSIM score indicating good perceptual quality
- Low MSE demonstrating accurate reconstruction
- Visual quality: Near-perfect reconstruction for most samples
Challenges & Learnings
- Finding the optimal latent space dimension (trade-off between compression and quality)
- Tuning loss functions to prioritize perceptual similarity
- Understanding the learned features in the latent space