Shahinur Alam, PhD

AI, HCI, and VR Researcher

With over 8 years of research experience, I specialize in human-computer interaction, virtual reality, AI, and medical imaging. I lead the NSF-funded project named SAIL, developing an immersive VR system for ASL education. I am passionate about driving technological advancements and improving user experiences.

Research Publications

Stats

Research Experience

With over eight years of research experience, I have authored 20+ published articles, reviewed 140+ scholarly works, and accumulated 150+ citations. These achievements reflect my commitment to advancing knowledge, proficiency in peer review, and substantial impact on the academic community.

Published Articles

Years of Institutional Research Experience

Citations

Recognized Journal Articles Reviewed

ASL Champ!

The ASL Champ! project uses Virtual Reality (VR) and deep learning to create an immersive platform for learning American Sign Language (ASL). Addressing challenges like the lack of real-time feedback in traditional ASL education, the platform combines motion capture with a deep learning model using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. Users interact with a signing avatar in a virtual coffee shop, receiving real-time feedback to improve sign accuracy. Achieving 90.12% training accuracy and 86.66% test accuracy, ASL Champ! offers an engaging, effective learning experience that makes ASL education more accessible and interactive.

The development of ASL Champ involved multidisciplinary collaboration, integrating 3D avatar design, motion capture, AI model development, and sign language data collection. The game aims to provide users with a more intuitive and accurate way to learn ASL, making it particularly beneficial for those with limited access to traditional ASL instruction methods.

ASL Champ! details

Air-Writing

This work focuses on developing a highly accurate trajectory-based air-writing recognition system using deep learning techniques and depth sensor technology. The system allows users to write in free space, with their finger movements captured as 3D trajectories by an Intel RealSense SR300 camera. The data is processed through normalization techniques to eliminate noise and ensure smoother trajectories. Two deep learning models, Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN), were employed to recognize the air-written digits. The system was trained on a self-collected dataset of 21,000 trajectories and the 6D Motion Gesture (6DMG) dataset, achieving a groundbreaking accuracy of 99.32% with the LSTM model and 99.26% with CNN. This research overcomes the limitations of traditional writing systems and gesture-based interfaces by offering a touchless, efficient method for input in applications such as augmented reality (AR), virtual reality (VR), and healthcare. The project’s key contributions include the development of a large public dataset and the integration of advanced neural network architectures to enhance recognition accuracy.

Air-writing details

Image Super-resolution

This project focuses on enhancing the resolution of images produced by integral imaging microscopy (IIM) using a deep learning-based Generative Adversarial Network (GAN). IIM provides 3D visualization of microscopic objects, but its images suffer from low resolution due to limitations in the micro-lens array (MLA) and poor lighting conditions. To address this, we developed a GAN-based super-resolution algorithm that improves the clarity and detail of IIM images by up to 8×, outperforming traditional methods. The GAN model consists of two parts: a generator that reconstructs high-resolution images from low-resolution inputs and a discriminator that distinguishes between real and generated images. By applying this method to various microscopic specimens, including biological samples and electronic components, we demonstrated significant improvements in image quality. The enhanced images retain fine details, depth, and edges, making them more suitable for scientific analysis. The model was trained using the PyTorch library on a high-performance computing system and tested with various metrics like PSNR, SSIM, and PSD, showing superior results compared to existing resolution enhancement techniques. This project provides an efficient, scalable solution for real-time image enhancement in IIM, with potential applications in biomedical science, nanophysics, and other fields requiring precise 3D imaging.

Image super-resolution details

Gesture Recognition

This project presents the implementation of a character recognition system based on finger-joint tracking using a 3-D depth camera, focusing on improving human-computer interaction (HCI). The system allows users to write characters, digits, and symbols in mid-air using simple hand gestures. The research aims to overcome the limitations of traditional gesture recognition systems, which often rely on wearable devices or support only limited character sets. Utilizing an Intel RealSense SR300 camera, the system tracks 22 finger joints to recognize 124 characters, including digits, alphabets, symbols, and special keys, offering a full keyboard experience. It employs a combination of Euclidean distance thresholding and geometric slope techniques for high accuracy in both single-hand and double-hand recognition modes. The system achieved an accuracy rate of over 91% and a recognition time of less than 60 milliseconds per character, making it suitable for real-time applications. Unlike many conventional systems, it operates effectively in both light and dark environments. The technology’s simplicity makes it user-friendly, with minimal training required, as confirmed by user studies. This system is highly versatile and can be applied across various industries, including virtual reality, healthcare, and accessibility, offering an innovative solution for hands-free input systems.

Gesture recognition details

ASL Champ!

The ASL Champ! project uses Virtual Reality (VR) and deep learning to create an immersive platform for learning American Sign Language (ASL). Addressing challenges like the lack of real-time feedback in traditional ASL education, the platform combines motion capture with a deep learning model using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. Users interact with a signing avatar in a virtual coffee shop, receiving real-time feedback to improve sign accuracy. Achieving 90.12% training accuracy and 86.66% test accuracy, ASL Champ! offers an engaging, effective learning experience that makes ASL education more accessible and interactive.

The development of ASL Champ involved multidisciplinary collaboration, integrating 3D avatar design, motion capture, AI model development, and sign language data collection. The game aims to provide users with a more intuitive and accurate way to learn ASL, making it particularly beneficial for those with limited access to traditional ASL instruction methods.

ASL Champ! details

Air-Writing

This work focuses on developing a highly accurate trajectory-based air-writing recognition system using deep learning techniques and depth sensor technology. The system allows users to write in free space, with their finger movements captured as 3D trajectories by an Intel RealSense SR300 camera. The data is processed through normalization techniques to eliminate noise and ensure smoother trajectories. Two deep learning models, Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN), were employed to recognize the air-written digits. The system was trained on a self-collected dataset of 21,000 trajectories and the 6D Motion Gesture (6DMG) dataset, achieving a groundbreaking accuracy of 99.32% with the LSTM model and 99.26% with CNN. This research overcomes the limitations of traditional writing systems and gesture-based interfaces by offering a touchless, efficient method for input in applications such as augmented reality (AR), virtual reality (VR), and healthcare. The project’s key contributions include the development of a large public dataset and the integration of advanced neural network architectures to enhance recognition accuracy.

Air-writing details

Image Super-resolution

This project focuses on enhancing the resolution of images produced by integral imaging microscopy (IIM) using a deep learning-based Generative Adversarial Network (GAN). IIM provides 3D visualization of microscopic objects, but its images suffer from low resolution due to limitations in the micro-lens array (MLA) and poor lighting conditions. To address this, we developed a GAN-based super-resolution algorithm that improves the clarity and detail of IIM images by up to 8×, outperforming traditional methods. The GAN model consists of two parts: a generator that reconstructs high-resolution images from low-resolution inputs and a discriminator that distinguishes between real and generated images. By applying this method to various microscopic specimens, including biological samples and electronic components, we demonstrated significant improvements in image quality. The enhanced images retain fine details, depth, and edges, making them more suitable for scientific analysis. The model was trained using the PyTorch library on a high-performance computing system and tested with various metrics like PSNR, SSIM, and PSD, showing superior results compared to existing resolution enhancement techniques. This project provides an efficient, scalable solution for real-time image enhancement in IIM, with potential applications in biomedical science, nanophysics, and other fields requiring precise 3D imaging.

Image super-resolution details

Gesture Recognition

This project presents the implementation of a character recognition system based on finger-joint tracking using a 3-D depth camera, focusing on improving human-computer interaction (HCI). The system allows users to write characters, digits, and symbols in mid-air using simple hand gestures. The research aims to overcome the limitations of traditional gesture recognition systems, which often rely on wearable devices or support only limited character sets. Utilizing an Intel RealSense SR300 camera, the system tracks 22 finger joints to recognize 124 characters, including digits, alphabets, symbols, and special keys, offering a full keyboard experience. It employs a combination of Euclidean distance thresholding and geometric slope techniques for high accuracy in both single-hand and double-hand recognition modes. The system achieved an accuracy rate of over 91% and a recognition time of less than 60 milliseconds per character, making it suitable for real-time applications. Unlike many conventional systems, it operates effectively in both light and dark environments. The technology’s simplicity makes it user-friendly, with minimal training required, as confirmed by user studies. This system is highly versatile and can be applied across various industries, including virtual reality, healthcare, and accessibility, offering an innovative solution for hands-free input systems.

Gesture recognition details

Top 5 High-Performing Research

M. S. Alam et al., ‘ASL champ!: a virtual reality game with deep-learning driven sign recognition’, Computers & Education: X Reality, vol. 4, p. 100059, 2024, doi: https://doi.org/10.1016/j.cexr.2024.100059
M. S. Alam, K. -C. Kwon and N. Kim, “TARNet: An Efficient and Lightweight Trajectory-Based Air-Writing Recognition Model Using a CNN and LSTM Network, Volume 2022, doi: https://doi.org/10.1155/2022/6063779
M. S. Alam, K. -C. Kwon and N. Kim, “Implementation of a Character Recognition System Based on Finger-Joint Tracking Using a Depth Camera,” in IEEE Transactions on Human-Machine Systems, vol. 51, no. 3, pp. 229-241, June 2021, doi: https://doi.org/10.1109/THMS.2021.3066854.
M. S. Alam, K. -C. Kwon, M. -U. Erdenebat, M. Y. Abbass; M. A. Alam, and N. Kim, “Super-Resolution Enhancement Method Based on Generative Adversarial Network for Integral Imaging Microscopy” in Sensors 2021, 21, 2164. https://doi.org/10.3390/s21062164.
M.S. Alam, K.-C. Kwon; M.A. Alam, M.Y. Abbass, S.M. Imtiaz, N. Kim, “Trajectory-Based Air-Writing Recognition Using Deep Neural Network and Depth Sensor,” in Sensors 2020, 20, 376. https://doi.org/10.3390/s20020376

Recent from Blogs

Array and Pointer

by Shahinur | Apr 17, 2019 | C, C++, Data Structure

Arrays and pointers are common in most programming languages. Most importantly, these two data structures are needed everywhere from any moderate to complex program. Here we will try to know details about these two data structures. Arrays and pointers are almost the...

What can be done with Python?

by Shahinur | Mar 29, 2019 | Python

In the last blog post we saw the advantages of using Python. By now you might be wondering what can be done with Python or whether Python is right for me. In this episode we will look at various applications of Python – GUI based software development When we think of...

Why use Python?

by Shahinur | Mar 24, 2019 | Python

Nowadays, in the crowd of numerous programming languages, it is very natural that the question comes to mind that why should I learn Python? Actually the question should have been why don't I learn Python! The fact is that most of the tools or software we use at the...

Neural Gas Algorithm

by Shahinur | Feb 26, 2019 | Algorithm

Neural Gas is a nice neural network that is used for optimal Data Representation. The specialty of this algorithm is that it can represent the features in the optimal level. During the training, the feature factories look like aerobic gas as it has been named "neural...

Likert Scale

by Shahinur | Jan 8, 2019 | Miscellaneous

Likert Scale is a type of rating or scaling method that is usually used in various types of surveys. In cases where the quality of something cannot be quantified, such as how good or bad I am, how much I like or dislike something, etc. This scaling method is mostly...

Activation Function

by Shahinur | May 5, 2018 | Deep Learning, Machine Learning

Activation function plays a very important role in any neural network. As the activation name suggests, this function's job is to trigger something. Yes, that's it. Activation function triggers a neuron or node. The activation function determines whether information...

Cross Validation

by Shahinur | Feb 27, 2018 | Deep Learning, Machine Learning

Cross Validation is a widely used term in Machine Learning. Usually some testing data is needed to validate a model. Many times it is seen that there is no similarity between the training data and the testing data, then to a large extent the model can predict...

Point Cloud

by Shahinur | Jan 7, 2018 | Basic Terminology

By hearing the name Point Cloud, it is understood that the cloud of points. That is, creating a cloud of where the points are. Depth of any object cannot be understood in normal camera or two dimensional camera. But in three-dimensional image or camera, the x,y,z axis...

Pixel Hogel and Voxel

by Shahinur | Feb 9, 2017 | Computer Vision, Miscellaneous

We usually hear the word pixel, but we cannot directly use the word pixel in a three-dimensional environment. Here we will see what Pixel, Hogel and Voxel are and why. Pixel The concept of pixel is used in a two-dimensional digital image or display device. The word...

« Older Entries

Next Entries »