Shahinur Alam, PhD

AI, HCI, and VR Researcher
With over 8 years of research experience, I specialize in human-computer interaction, virtual reality, AI, and medical imaging. I lead the NSF-funded project named SAIL, developing an immersive VR system for ASL education. I am passionate about driving technological advancements and improving user experiences.

ResearchPublications

Stats

Research Experience

With over eight years of research experience, I have authored 20+ published articles, reviewed 140+ scholarly works, and accumulated 150+ citations. These achievements reflect my commitment to advancing knowledge, proficiency in peer review, and substantial impact on the academic community.

Published Articles

Years of Institutional Research Experience

Citations

Recognized Journal Articles Reviewed

ASL Champ!

The ASL Champ! project uses Virtual Reality (VR) and deep learning to create an immersive platform for learning American Sign Language (ASL). Addressing challenges like the lack of real-time feedback in traditional ASL education, the platform combines motion capture with a deep learning model using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. Users interact with a signing avatar in a virtual coffee shop, receiving real-time feedback to improve sign accuracy. Achieving 90.12% training accuracy and 86.66% test accuracy, ASL Champ! offers an engaging, effective learning experience that makes ASL education more accessible and interactive.

The development of ASL Champ involved multidisciplinary collaboration, integrating 3D avatar design, motion capture, AI model development, and sign language data collection. The game aims to provide users with a more intuitive and accurate way to learn ASL, making it particularly beneficial for those with limited access to traditional ASL instruction methods.

Air-Writing

This work focuses on developing a highly accurate trajectory-based air-writing recognition system using deep learning techniques and depth sensor technology. The system allows users to write in free space, with their finger movements captured as 3D trajectories by an Intel RealSense SR300 camera. The data is processed through normalization techniques to eliminate noise and ensure smoother trajectories. Two deep learning models, Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN), were employed to recognize the air-written digits. The system was trained on a self-collected dataset of 21,000 trajectories and the 6D Motion Gesture (6DMG) dataset, achieving a groundbreaking accuracy of 99.32% with the LSTM model and 99.26% with CNN. This research overcomes the limitations of traditional writing systems and gesture-based interfaces by offering a touchless, efficient method for input in applications such as augmented reality (AR), virtual reality (VR), and healthcare. The project’s key contributions include the development of a large public dataset and the integration of advanced neural network architectures to enhance recognition accuracy.

Image Super-resolution

This project focuses on enhancing the resolution of images produced by integral imaging microscopy (IIM) using a deep learning-based Generative Adversarial Network (GAN). IIM provides 3D visualization of microscopic objects, but its images suffer from low resolution due to limitations in the micro-lens array (MLA) and poor lighting conditions. To address this, we developed a GAN-based super-resolution algorithm that improves the clarity and detail of IIM images by up to 8×, outperforming traditional methods. The GAN model consists of two parts: a generator that reconstructs high-resolution images from low-resolution inputs and a discriminator that distinguishes between real and generated images. By applying this method to various microscopic specimens, including biological samples and electronic components, we demonstrated significant improvements in image quality. The enhanced images retain fine details, depth, and edges, making them more suitable for scientific analysis. The model was trained using the PyTorch library on a high-performance computing system and tested with various metrics like PSNR, SSIM, and PSD, showing superior results compared to existing resolution enhancement techniques. This project provides an efficient, scalable solution for real-time image enhancement in IIM, with potential applications in biomedical science, nanophysics, and other fields requiring precise 3D imaging.

Gesture Recognition

This project presents the implementation of a character recognition system based on finger-joint tracking using a 3-D depth camera, focusing on improving human-computer interaction (HCI). The system allows users to write characters, digits, and symbols in mid-air using simple hand gestures. The research aims to overcome the limitations of traditional gesture recognition systems, which often rely on wearable devices or support only limited character sets. Utilizing an Intel RealSense SR300 camera, the system tracks 22 finger joints to recognize 124 characters, including digits, alphabets, symbols, and special keys, offering a full keyboard experience. It employs a combination of Euclidean distance thresholding and geometric slope techniques for high accuracy in both single-hand and double-hand recognition modes. The system achieved an accuracy rate of over 91% and a recognition time of less than 60 milliseconds per character, making it suitable for real-time applications. Unlike many conventional systems, it operates effectively in both light and dark environments. The technology’s simplicity makes it user-friendly, with minimal training required, as confirmed by user studies. This system is highly versatile and can be applied across various industries, including virtual reality, healthcare, and accessibility, offering an innovative solution for hands-free input systems.

ASL Champ!

The ASL Champ! project uses Virtual Reality (VR) and deep learning to create an immersive platform for learning American Sign Language (ASL). Addressing challenges like the lack of real-time feedback in traditional ASL education, the platform combines motion capture with a deep learning model using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. Users interact with a signing avatar in a virtual coffee shop, receiving real-time feedback to improve sign accuracy. Achieving 90.12% training accuracy and 86.66% test accuracy, ASL Champ! offers an engaging, effective learning experience that makes ASL education more accessible and interactive.

The development of ASL Champ involved multidisciplinary collaboration, integrating 3D avatar design, motion capture, AI model development, and sign language data collection. The game aims to provide users with a more intuitive and accurate way to learn ASL, making it particularly beneficial for those with limited access to traditional ASL instruction methods.

Air-Writing

This work focuses on developing a highly accurate trajectory-based air-writing recognition system using deep learning techniques and depth sensor technology. The system allows users to write in free space, with their finger movements captured as 3D trajectories by an Intel RealSense SR300 camera. The data is processed through normalization techniques to eliminate noise and ensure smoother trajectories. Two deep learning models, Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN), were employed to recognize the air-written digits. The system was trained on a self-collected dataset of 21,000 trajectories and the 6D Motion Gesture (6DMG) dataset, achieving a groundbreaking accuracy of 99.32% with the LSTM model and 99.26% with CNN. This research overcomes the limitations of traditional writing systems and gesture-based interfaces by offering a touchless, efficient method for input in applications such as augmented reality (AR), virtual reality (VR), and healthcare. The project’s key contributions include the development of a large public dataset and the integration of advanced neural network architectures to enhance recognition accuracy.

Image Super-resolution

This project focuses on enhancing the resolution of images produced by integral imaging microscopy (IIM) using a deep learning-based Generative Adversarial Network (GAN). IIM provides 3D visualization of microscopic objects, but its images suffer from low resolution due to limitations in the micro-lens array (MLA) and poor lighting conditions. To address this, we developed a GAN-based super-resolution algorithm that improves the clarity and detail of IIM images by up to 8×, outperforming traditional methods. The GAN model consists of two parts: a generator that reconstructs high-resolution images from low-resolution inputs and a discriminator that distinguishes between real and generated images. By applying this method to various microscopic specimens, including biological samples and electronic components, we demonstrated significant improvements in image quality. The enhanced images retain fine details, depth, and edges, making them more suitable for scientific analysis. The model was trained using the PyTorch library on a high-performance computing system and tested with various metrics like PSNR, SSIM, and PSD, showing superior results compared to existing resolution enhancement techniques. This project provides an efficient, scalable solution for real-time image enhancement in IIM, with potential applications in biomedical science, nanophysics, and other fields requiring precise 3D imaging.

Gesture Recognition

This project presents the implementation of a character recognition system based on finger-joint tracking using a 3-D depth camera, focusing on improving human-computer interaction (HCI). The system allows users to write characters, digits, and symbols in mid-air using simple hand gestures. The research aims to overcome the limitations of traditional gesture recognition systems, which often rely on wearable devices or support only limited character sets. Utilizing an Intel RealSense SR300 camera, the system tracks 22 finger joints to recognize 124 characters, including digits, alphabets, symbols, and special keys, offering a full keyboard experience. It employs a combination of Euclidean distance thresholding and geometric slope techniques for high accuracy in both single-hand and double-hand recognition modes. The system achieved an accuracy rate of over 91% and a recognition time of less than 60 milliseconds per character, making it suitable for real-time applications. Unlike many conventional systems, it operates effectively in both light and dark environments. The technology’s simplicity makes it user-friendly, with minimal training required, as confirmed by user studies. This system is highly versatile and can be applied across various industries, including virtual reality, healthcare, and accessibility, offering an innovative solution for hands-free input systems.

Top 5 High-Performing Research

  • M. S. Alam et al., ‘ASL champ!: a virtual reality game with deep-learning driven sign recognition’, Computers & Education: X Reality, vol. 4, p. 100059, 2024, doi: https://doi.org/10.1016/j.cexr.2024.100059 
  • M. S. Alam, K. -C. Kwon and N. Kim, “TARNet: An Efficient and Lightweight Trajectory-Based Air-Writing Recognition Model Using a CNN and LSTM Network, Volume 2022, doi: https://doi.org/10.1155/2022/6063779
  • M. S. Alam, K. -C. Kwon and N. Kim, “Implementation of a Character Recognition System Based on Finger-Joint Tracking Using a Depth Camera,” in IEEE Transactions on Human-Machine Systems, vol. 51, no. 3, pp. 229-241, June 2021, doi: https://doi.org/10.1109/THMS.2021.3066854.
  • M. S. Alam, K. -C. Kwon, M. -U. Erdenebat, M. Y. Abbass; M. A. Alam, and N. Kim, “Super-Resolution Enhancement Method Based on Generative Adversarial Network for Integral Imaging Microscopy” in Sensors 2021, 21, 2164. https://doi.org/10.3390/s21062164.
  • M.S. Alam, K.-C. Kwon; M.A. Alam, M.Y. Abbass, S.M. Imtiaz, N. Kim, “Trajectory-Based Air-Writing Recognition Using Deep Neural Network and Depth Sensor,” in Sensors 2020, 20, 376. https://doi.org/10.3390/s20020376

Recent from Blogs

Exploration and Exploitation

Exploration and Exploitation

Exploration and Exploitation are two fundamental processes in reinforcement learning. In short, exploration is about learning something new, and exploitation is about making decisions based on known information. Balancing these two is crucial. Exploration Through this...

Avoiding Conflicts on GitHub: The Power of Locking

Avoiding Conflicts on GitHub: The Power of Locking

GitHub is a hub of collaboration where developers come together to work on projects, share ideas, and resolve issues. While this collaborative environment is one of GitHub's strengths, it can also give rise to commit conflicts. GitHub provides a powerful tool called...

Basic Anaconda Commands

Basic Anaconda Commands

Anaconda is a fantastic tool for all deep learning, machine learning, and computer vision researcher. It reduces tons of extra work for setting up environments and tools. Personally, I love it so much. Anaconda Navigator is a great UI for setting up environments and...

Hand joint detection using OpenCV and MediaPipe

Hand joint detection using OpenCV and MediaPipe

MediaPipe is a cross-platform framework for building multimodal applied machine learning pipelines. MediaPipe Python package is available on PyPI for Linux, macOS, and Windows. Today we will write a simple code for hand joint detection using OpenCV. At...