Projects:
Neural Speech Decoding
- Implemented a vector quantization module for integration into an existing speech decoder framework, which maps input spectrograms to speech parameters including Pitch Frequency, Formant Filter Center Frequencies, and Broadband Unvoiced Filter Frequency. Focused on implementing VQ-VAE1 and VQ-VAE2 models to optimize performance with our dataset.
- Developed a phoneme classifier using spectrograms as input data to integrate phoneme classification loss into an existing speech generator model. Evaluated several models, including MLP, simple RNN, GRU, and LSTM, to accurately classify phonemes from the spectrogram features.
Deep Learning-Based Brain Decoding
Developed a brain decoding method using visual features from deep neural networks and the Natural Scenes Dataset. Features were extracted with ResNet-50 and DINOv2, and dimensionality was reduced using PCA and UMAP. Nilearn’s SpaceNet Decoder with Graph-Net regularization was employed to generate classification and regression weight maps from fMRI data. Find our paper here.
Handwritten Word Synthesis with GANs
Generated a dataset of handwritten Persian words by applying Generative Adversarial Networks (GANs) to a dataset of typed Persian words, initially extracted using a YOLOv5 model on typed documents.