Shahinur Alam, PhD
AI, HCI, and VR ResearcherStats
Research Experience
With over eight years of research experience, I have authored 20+ published articles, reviewed 140+ scholarly works, and accumulated 150+ citations. These achievements reflect my commitment to advancing knowledge, proficiency in peer review, and substantial impact on the academic community.
Published Articles
Years of Institutional Research Experience
Citations
Recognized Journal Articles Reviewed
ASL Champ!
The ASL Champ! project uses Virtual Reality (VR) and deep learning to create an immersive platform for learning American Sign Language (ASL). Addressing challenges like the lack of real-time feedback in traditional ASL education, the platform combines motion capture with a deep learning model using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. Users interact with a signing avatar in a virtual coffee shop, receiving real-time feedback to improve sign accuracy. Achieving 90.12% training accuracy and 86.66% test accuracy, ASL Champ! offers an engaging, effective learning experience that makes ASL education more accessible and interactive.
The development of ASL Champ involved multidisciplinary collaboration, integrating 3D avatar design, motion capture, AI model development, and sign language data collection. The game aims to provide users with a more intuitive and accurate way to learn ASL, making it particularly beneficial for those with limited access to traditional ASL instruction methods.
Air-Writing
This work focuses on developing a highly accurate trajectory-based air-writing recognition system using deep learning techniques and depth sensor technology. The system allows users to write in free space, with their finger movements captured as 3D trajectories by an Intel RealSense SR300 camera. The data is processed through normalization techniques to eliminate noise and ensure smoother trajectories. Two deep learning models, Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN), were employed to recognize the air-written digits. The system was trained on a self-collected dataset of 21,000 trajectories and the 6D Motion Gesture (6DMG) dataset, achieving a groundbreaking accuracy of 99.32% with the LSTM model and 99.26% with CNN. This research overcomes the limitations of traditional writing systems and gesture-based interfaces by offering a touchless, efficient method for input in applications such as augmented reality (AR), virtual reality (VR), and healthcare. The project’s key contributions include the development of a large public dataset and the integration of advanced neural network architectures to enhance recognition accuracy.
Image Super-resolution
This project focuses on enhancing the resolution of images produced by integral imaging microscopy (IIM) using a deep learning-based Generative Adversarial Network (GAN). IIM provides 3D visualization of microscopic objects, but its images suffer from low resolution due to limitations in the micro-lens array (MLA) and poor lighting conditions. To address this, we developed a GAN-based super-resolution algorithm that improves the clarity and detail of IIM images by up to 8×, outperforming traditional methods. The GAN model consists of two parts: a generator that reconstructs high-resolution images from low-resolution inputs and a discriminator that distinguishes between real and generated images. By applying this method to various microscopic specimens, including biological samples and electronic components, we demonstrated significant improvements in image quality. The enhanced images retain fine details, depth, and edges, making them more suitable for scientific analysis. The model was trained using the PyTorch library on a high-performance computing system and tested with various metrics like PSNR, SSIM, and PSD, showing superior results compared to existing resolution enhancement techniques. This project provides an efficient, scalable solution for real-time image enhancement in IIM, with potential applications in biomedical science, nanophysics, and other fields requiring precise 3D imaging.
Gesture Recognition
This project presents the implementation of a character recognition system based on finger-joint tracking using a 3-D depth camera, focusing on improving human-computer interaction (HCI). The system allows users to write characters, digits, and symbols in mid-air using simple hand gestures. The research aims to overcome the limitations of traditional gesture recognition systems, which often rely on wearable devices or support only limited character sets. Utilizing an Intel RealSense SR300 camera, the system tracks 22 finger joints to recognize 124 characters, including digits, alphabets, symbols, and special keys, offering a full keyboard experience. It employs a combination of Euclidean distance thresholding and geometric slope techniques for high accuracy in both single-hand and double-hand recognition modes. The system achieved an accuracy rate of over 91% and a recognition time of less than 60 milliseconds per character, making it suitable for real-time applications. Unlike many conventional systems, it operates effectively in both light and dark environments. The technology’s simplicity makes it user-friendly, with minimal training required, as confirmed by user studies. This system is highly versatile and can be applied across various industries, including virtual reality, healthcare, and accessibility, offering an innovative solution for hands-free input systems.
ASL Champ!
The ASL Champ! project uses Virtual Reality (VR) and deep learning to create an immersive platform for learning American Sign Language (ASL). Addressing challenges like the lack of real-time feedback in traditional ASL education, the platform combines motion capture with a deep learning model using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. Users interact with a signing avatar in a virtual coffee shop, receiving real-time feedback to improve sign accuracy. Achieving 90.12% training accuracy and 86.66% test accuracy, ASL Champ! offers an engaging, effective learning experience that makes ASL education more accessible and interactive.
The development of ASL Champ involved multidisciplinary collaboration, integrating 3D avatar design, motion capture, AI model development, and sign language data collection. The game aims to provide users with a more intuitive and accurate way to learn ASL, making it particularly beneficial for those with limited access to traditional ASL instruction methods.
Air-Writing
This work focuses on developing a highly accurate trajectory-based air-writing recognition system using deep learning techniques and depth sensor technology. The system allows users to write in free space, with their finger movements captured as 3D trajectories by an Intel RealSense SR300 camera. The data is processed through normalization techniques to eliminate noise and ensure smoother trajectories. Two deep learning models, Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN), were employed to recognize the air-written digits. The system was trained on a self-collected dataset of 21,000 trajectories and the 6D Motion Gesture (6DMG) dataset, achieving a groundbreaking accuracy of 99.32% with the LSTM model and 99.26% with CNN. This research overcomes the limitations of traditional writing systems and gesture-based interfaces by offering a touchless, efficient method for input in applications such as augmented reality (AR), virtual reality (VR), and healthcare. The project’s key contributions include the development of a large public dataset and the integration of advanced neural network architectures to enhance recognition accuracy.
Image Super-resolution
This project focuses on enhancing the resolution of images produced by integral imaging microscopy (IIM) using a deep learning-based Generative Adversarial Network (GAN). IIM provides 3D visualization of microscopic objects, but its images suffer from low resolution due to limitations in the micro-lens array (MLA) and poor lighting conditions. To address this, we developed a GAN-based super-resolution algorithm that improves the clarity and detail of IIM images by up to 8×, outperforming traditional methods. The GAN model consists of two parts: a generator that reconstructs high-resolution images from low-resolution inputs and a discriminator that distinguishes between real and generated images. By applying this method to various microscopic specimens, including biological samples and electronic components, we demonstrated significant improvements in image quality. The enhanced images retain fine details, depth, and edges, making them more suitable for scientific analysis. The model was trained using the PyTorch library on a high-performance computing system and tested with various metrics like PSNR, SSIM, and PSD, showing superior results compared to existing resolution enhancement techniques. This project provides an efficient, scalable solution for real-time image enhancement in IIM, with potential applications in biomedical science, nanophysics, and other fields requiring precise 3D imaging.
Gesture Recognition
This project presents the implementation of a character recognition system based on finger-joint tracking using a 3-D depth camera, focusing on improving human-computer interaction (HCI). The system allows users to write characters, digits, and symbols in mid-air using simple hand gestures. The research aims to overcome the limitations of traditional gesture recognition systems, which often rely on wearable devices or support only limited character sets. Utilizing an Intel RealSense SR300 camera, the system tracks 22 finger joints to recognize 124 characters, including digits, alphabets, symbols, and special keys, offering a full keyboard experience. It employs a combination of Euclidean distance thresholding and geometric slope techniques for high accuracy in both single-hand and double-hand recognition modes. The system achieved an accuracy rate of over 91% and a recognition time of less than 60 milliseconds per character, making it suitable for real-time applications. Unlike many conventional systems, it operates effectively in both light and dark environments. The technology’s simplicity makes it user-friendly, with minimal training required, as confirmed by user studies. This system is highly versatile and can be applied across various industries, including virtual reality, healthcare, and accessibility, offering an innovative solution for hands-free input systems.
Top 5 High-Performing Research
- M. S. Alam et al., ‘ASL champ!: a virtual reality game with deep-learning driven sign recognition’, Computers & Education: X Reality, vol. 4, p. 100059, 2024, doi: https://doi.org/10.1016/j.cexr.2024.100059
- M. S. Alam, K. -C. Kwon and N. Kim, “TARNet: An Efficient and Lightweight Trajectory-Based Air-Writing Recognition Model Using a CNN and LSTM Network, Volume 2022, doi: https://doi.org/10.1155/2022/6063779
- M. S. Alam, K. -C. Kwon and N. Kim, “Implementation of a Character Recognition System Based on Finger-Joint Tracking Using a Depth Camera,” in IEEE Transactions on Human-Machine Systems, vol. 51, no. 3, pp. 229-241, June 2021, doi: https://doi.org/10.1109/THMS.2021.3066854.
- M. S. Alam, K. -C. Kwon, M. -U. Erdenebat, M. Y. Abbass; M. A. Alam, and N. Kim, “Super-Resolution Enhancement Method Based on Generative Adversarial Network for Integral Imaging Microscopy” in Sensors 2021, 21, 2164. https://doi.org/10.3390/s21062164.
- M.S. Alam, K.-C. Kwon; M.A. Alam, M.Y. Abbass, S.M. Imtiaz, N. Kim, “Trajectory-Based Air-Writing Recognition Using Deep Neural Network and Depth Sensor,” in Sensors 2020, 20, 376. https://doi.org/10.3390/s20020376
Recent from Blogs
Database Programming in Python
This segment is a bit advanced, so those who have no knowledge of databases or are just starting with Python may skip this part. Python's database interface is very rich and supports many database systems. Some notable and commonly used database systems include: MySQL...
Python Network Programming
Python network programming refers to developing applications using the Python programming language that involves communication over computer networks. This includes a wide range of tasks from low-level socket programming to high-level frameworks for creating web...
Very basic and frequently used Japanese phrases
Today I am here in Japan for a conference. It is very wonderful, people are very nice and polite. But the main problem is the language barrier. It is the hardest thing that you are trying to express your feelings and emotions, but you can't. So, I plan to learn some...
Draw/plot a line graph in python using matplotlib
Data visualization and interpretation are very important to understand the data and its property. Making decisions from raw data is really difficult especially in machine learning, deep learning, accuracy comparison, etc. Using python it is very easy to plot a graph,...
GUI Programming in Python
Previously, we touched upon GUI in the section on what can be done with Python. For the sake of discussion, let's revisit a few points. GUI stands for Graphical User Interface. There is no alternative to GUI programming to simplify the tasks of a computer used by...
File in Python
So far, all the programs we have seen are console-based, meaning we see the output on the computer screen. But what if we want to save our results in a file on the computer or work with a file stored on the computer? In that case, we need to read from or write to a...
Error Handling in Python
When programming, encountering errors or mistakes is very common, but fixing them can be a complex task. A good programmer is characterized by their ability to easily identify and debug any errors. Errors can generally be divided into three categories: Compile Time...
Polymorphism in Python
Similar to inheritance, polymorphism is an important paradigm in object-oriented programming. Polymorphism in object-oriented programming refers to the ability of different classes to be treated as instances of the same class through a common interface. Let's start...
Inheritance in Python
Inheritance is a unique feature of any object-oriented programming language. It is a simple concept. Consider this: your father owns a car, which means (according to Bangladeshi customs) it is also your car, even though you might own another car yourself. Here, your...