Shahinur Alam, PhD
AI, HCI, and VR ResearcherStats
Research Experience
With over eight years of research experience, I have authored 20+ published articles, reviewed 140+ scholarly works, and accumulated 150+ citations. These achievements reflect my commitment to advancing knowledge, proficiency in peer review, and substantial impact on the academic community.
Published Articles
Years of Institutional Research Experience
Citations
Recognized Journal Articles Reviewed
ASL Champ!
The ASL Champ! project uses Virtual Reality (VR) and deep learning to create an immersive platform for learning American Sign Language (ASL). Addressing challenges like the lack of real-time feedback in traditional ASL education, the platform combines motion capture with a deep learning model using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. Users interact with a signing avatar in a virtual coffee shop, receiving real-time feedback to improve sign accuracy. Achieving 90.12% training accuracy and 86.66% test accuracy, ASL Champ! offers an engaging, effective learning experience that makes ASL education more accessible and interactive.
The development of ASL Champ involved multidisciplinary collaboration, integrating 3D avatar design, motion capture, AI model development, and sign language data collection. The game aims to provide users with a more intuitive and accurate way to learn ASL, making it particularly beneficial for those with limited access to traditional ASL instruction methods.
Air-Writing
This work focuses on developing a highly accurate trajectory-based air-writing recognition system using deep learning techniques and depth sensor technology. The system allows users to write in free space, with their finger movements captured as 3D trajectories by an Intel RealSense SR300 camera. The data is processed through normalization techniques to eliminate noise and ensure smoother trajectories. Two deep learning models, Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN), were employed to recognize the air-written digits. The system was trained on a self-collected dataset of 21,000 trajectories and the 6D Motion Gesture (6DMG) dataset, achieving a groundbreaking accuracy of 99.32% with the LSTM model and 99.26% with CNN. This research overcomes the limitations of traditional writing systems and gesture-based interfaces by offering a touchless, efficient method for input in applications such as augmented reality (AR), virtual reality (VR), and healthcare. The project’s key contributions include the development of a large public dataset and the integration of advanced neural network architectures to enhance recognition accuracy.
Image Super-resolution
This project focuses on enhancing the resolution of images produced by integral imaging microscopy (IIM) using a deep learning-based Generative Adversarial Network (GAN). IIM provides 3D visualization of microscopic objects, but its images suffer from low resolution due to limitations in the micro-lens array (MLA) and poor lighting conditions. To address this, we developed a GAN-based super-resolution algorithm that improves the clarity and detail of IIM images by up to 8×, outperforming traditional methods. The GAN model consists of two parts: a generator that reconstructs high-resolution images from low-resolution inputs and a discriminator that distinguishes between real and generated images. By applying this method to various microscopic specimens, including biological samples and electronic components, we demonstrated significant improvements in image quality. The enhanced images retain fine details, depth, and edges, making them more suitable for scientific analysis. The model was trained using the PyTorch library on a high-performance computing system and tested with various metrics like PSNR, SSIM, and PSD, showing superior results compared to existing resolution enhancement techniques. This project provides an efficient, scalable solution for real-time image enhancement in IIM, with potential applications in biomedical science, nanophysics, and other fields requiring precise 3D imaging.
Gesture Recognition
This project presents the implementation of a character recognition system based on finger-joint tracking using a 3-D depth camera, focusing on improving human-computer interaction (HCI). The system allows users to write characters, digits, and symbols in mid-air using simple hand gestures. The research aims to overcome the limitations of traditional gesture recognition systems, which often rely on wearable devices or support only limited character sets. Utilizing an Intel RealSense SR300 camera, the system tracks 22 finger joints to recognize 124 characters, including digits, alphabets, symbols, and special keys, offering a full keyboard experience. It employs a combination of Euclidean distance thresholding and geometric slope techniques for high accuracy in both single-hand and double-hand recognition modes. The system achieved an accuracy rate of over 91% and a recognition time of less than 60 milliseconds per character, making it suitable for real-time applications. Unlike many conventional systems, it operates effectively in both light and dark environments. The technology’s simplicity makes it user-friendly, with minimal training required, as confirmed by user studies. This system is highly versatile and can be applied across various industries, including virtual reality, healthcare, and accessibility, offering an innovative solution for hands-free input systems.
ASL Champ!
The ASL Champ! project uses Virtual Reality (VR) and deep learning to create an immersive platform for learning American Sign Language (ASL). Addressing challenges like the lack of real-time feedback in traditional ASL education, the platform combines motion capture with a deep learning model using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. Users interact with a signing avatar in a virtual coffee shop, receiving real-time feedback to improve sign accuracy. Achieving 90.12% training accuracy and 86.66% test accuracy, ASL Champ! offers an engaging, effective learning experience that makes ASL education more accessible and interactive.
The development of ASL Champ involved multidisciplinary collaboration, integrating 3D avatar design, motion capture, AI model development, and sign language data collection. The game aims to provide users with a more intuitive and accurate way to learn ASL, making it particularly beneficial for those with limited access to traditional ASL instruction methods.
Air-Writing
This work focuses on developing a highly accurate trajectory-based air-writing recognition system using deep learning techniques and depth sensor technology. The system allows users to write in free space, with their finger movements captured as 3D trajectories by an Intel RealSense SR300 camera. The data is processed through normalization techniques to eliminate noise and ensure smoother trajectories. Two deep learning models, Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN), were employed to recognize the air-written digits. The system was trained on a self-collected dataset of 21,000 trajectories and the 6D Motion Gesture (6DMG) dataset, achieving a groundbreaking accuracy of 99.32% with the LSTM model and 99.26% with CNN. This research overcomes the limitations of traditional writing systems and gesture-based interfaces by offering a touchless, efficient method for input in applications such as augmented reality (AR), virtual reality (VR), and healthcare. The project’s key contributions include the development of a large public dataset and the integration of advanced neural network architectures to enhance recognition accuracy.
Image Super-resolution
This project focuses on enhancing the resolution of images produced by integral imaging microscopy (IIM) using a deep learning-based Generative Adversarial Network (GAN). IIM provides 3D visualization of microscopic objects, but its images suffer from low resolution due to limitations in the micro-lens array (MLA) and poor lighting conditions. To address this, we developed a GAN-based super-resolution algorithm that improves the clarity and detail of IIM images by up to 8×, outperforming traditional methods. The GAN model consists of two parts: a generator that reconstructs high-resolution images from low-resolution inputs and a discriminator that distinguishes between real and generated images. By applying this method to various microscopic specimens, including biological samples and electronic components, we demonstrated significant improvements in image quality. The enhanced images retain fine details, depth, and edges, making them more suitable for scientific analysis. The model was trained using the PyTorch library on a high-performance computing system and tested with various metrics like PSNR, SSIM, and PSD, showing superior results compared to existing resolution enhancement techniques. This project provides an efficient, scalable solution for real-time image enhancement in IIM, with potential applications in biomedical science, nanophysics, and other fields requiring precise 3D imaging.
Gesture Recognition
This project presents the implementation of a character recognition system based on finger-joint tracking using a 3-D depth camera, focusing on improving human-computer interaction (HCI). The system allows users to write characters, digits, and symbols in mid-air using simple hand gestures. The research aims to overcome the limitations of traditional gesture recognition systems, which often rely on wearable devices or support only limited character sets. Utilizing an Intel RealSense SR300 camera, the system tracks 22 finger joints to recognize 124 characters, including digits, alphabets, symbols, and special keys, offering a full keyboard experience. It employs a combination of Euclidean distance thresholding and geometric slope techniques for high accuracy in both single-hand and double-hand recognition modes. The system achieved an accuracy rate of over 91% and a recognition time of less than 60 milliseconds per character, making it suitable for real-time applications. Unlike many conventional systems, it operates effectively in both light and dark environments. The technology’s simplicity makes it user-friendly, with minimal training required, as confirmed by user studies. This system is highly versatile and can be applied across various industries, including virtual reality, healthcare, and accessibility, offering an innovative solution for hands-free input systems.
Top 5 High-Performing Research
- M. S. Alam et al., ‘ASL champ!: a virtual reality game with deep-learning driven sign recognition’, Computers & Education: X Reality, vol. 4, p. 100059, 2024, doi: https://doi.org/10.1016/j.cexr.2024.100059
- M. S. Alam, K. -C. Kwon and N. Kim, “TARNet: An Efficient and Lightweight Trajectory-Based Air-Writing Recognition Model Using a CNN and LSTM Network, Volume 2022, doi: https://doi.org/10.1155/2022/6063779
- M. S. Alam, K. -C. Kwon and N. Kim, “Implementation of a Character Recognition System Based on Finger-Joint Tracking Using a Depth Camera,” in IEEE Transactions on Human-Machine Systems, vol. 51, no. 3, pp. 229-241, June 2021, doi: https://doi.org/10.1109/THMS.2021.3066854.
- M. S. Alam, K. -C. Kwon, M. -U. Erdenebat, M. Y. Abbass; M. A. Alam, and N. Kim, “Super-Resolution Enhancement Method Based on Generative Adversarial Network for Integral Imaging Microscopy” in Sensors 2021, 21, 2164. https://doi.org/10.3390/s21062164.
- M.S. Alam, K.-C. Kwon; M.A. Alam, M.Y. Abbass, S.M. Imtiaz, N. Kim, “Trajectory-Based Air-Writing Recognition Using Deep Neural Network and Depth Sensor,” in Sensors 2020, 20, 376. https://doi.org/10.3390/s20020376
Recent from Blogs
“Hello World” in Python
Programmers usually start writing their first program with "Hello World". We will also start with "Hello World" here. However, to run your first Python program, you must have Python installed and the environment set up. Otherwise, if there are any errors, an error...
Python Environment Setup
Command Line Interface (CLI) To use the command line interface, we first need to install Python's core package. This varies for different operating systems. Windows First, download the latest version 3 of Python from this link. Once downloaded, double-click on the...
Python Interpreter
You might already know that machines or computers always operate in machine language, meaning that regardless of the language we use, it must be translated into a form comprehensible to the computer; this task is performed by an interpreter or a compiler. An...
Stack
A stack is a type of abstract data type. In a stack, data is stored in a sequential list format, and the primary operations, such as adding or deleting an element, occur at one end (top) of the stack. This is why it is referred to as Last In First Out (LIFO). The...
Queue
A queue, like a stack, is a type of abstract data type. A queue is an ordered list where data is inserted at one end and deleted from the other end. The end where data is inserted is called the front, and the end where data is deleted is called the rear. A queue...
Asymptotic Notation
Suppose we want to add a large number of integers, how long will this operation take? The first question that comes to mind is the number of integers, let's assume this number is n. Can we specify exactly how long this will take? Yes, we can for a specific computer....
Structure and Union
Structure and union are both user defined data types of C programming language. User defined data type is a data type that the user i.e. the programmer will decide what type it will be. Sometimes we need to do some complex programs when using basic data types is...
Which version of Python should I use?
Python has two major versions, which are updated separately. These versions are Python 2 and Python 3. Common Python users often find themselves in doubt about these versions. In continuation and considering new users, a brief discussion is presented here. Both...
Technical advantage of Python
In previous posts, we have seen some general discussion of Python. But a beginner, programmer or developer can naturally ask the question that is Python technically sound? In this blog post, we will discuss about this - Object Oriented Python supports three types of...