Air-Writing: A Detailed Look into Trajectory-Based Air-Writing Recognition Using Deep Learning

In an increasingly digital world, the interaction between humans and machines is constantly evolving. One of the most exciting frontiers in this development is the ability to interact with devices through air-writing—writing in free space with just the movement of a finger or hand. Air-writing recognition systems offer a unique, touchless method for input, making them ideal for environments where physical interaction with devices might be difficult or inconvenient. In my recent research, I developed a trajectory-based air-writing recognition system using deep neural networks and depth sensors. This research not only advances human-computer interaction (HCI) but also presents a breakthrough in the accuracy of air-writing systems.

Motivation

The motivation behind this research lies in the limitations of traditional and gesture-based writing systems. Writing on paper or using a stylus has its restrictions, especially in environments where physical interaction is either difficult or not possible, such as augmented reality (AR), virtual reality (VR), or touchless control systems. Similarly, gesture-based systems, though useful, often have a limited range of recognized gestures and can be cumbersome for users to learn and remember.

Air-writing systems offer a promising alternative, allowing users to write in the air without the need for physical input tools like pens or styluses. However, this method also presents challenges—variability in writing styles, non-uniform character formation, and spatiotemporal inconsistencies can affect the accuracy of recognition. Therefore, this research aimed to overcome these challenges and develop a robust system that could accurately recognize air-writing across different styles and conditions.

Research Approach

The approach to this research was to develop a system that captures the 3D trajectory of a user’s finger as they write in the air and then uses deep learning models to interpret the movements into recognized characters or digits. This system is powered by a depth camera, which tracks the user’s fingertip as they move it in free space. Specifically, we used the Intel RealSense SR300 camera, a time-of-flight (TOF) sensor, known for its precision in detecting hand movements and gesture tracking.

Once the trajectory data was captured, it needed to be processed and cleaned to ensure accuracy. In air-writing, the fingertip often follows non-uniform paths, creating noisy data. To address this, we employed normalization techniques such as nearest neighbor and root point translation, which helped smooth the trajectory and eliminate unnecessary movements. These techniques were critical in preparing the data for input into the neural networks.

Technology Used

We leveraged two state-of-the-art deep learning algorithms—Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN)—to recognize and interpret the air-writing trajectories.

LSTM (Long Short-Term Memory)

LSTM is a type of recurrent neural network (RNN) that is well-suited for time-series data, making it ideal for this project. Air-writing, by nature, involves sequential data since the system needs to process a series of 3D points as the user writes. The LSTM model was designed with multiple layers to handle this sequential data, taking into account both the spatial and temporal aspects of the writing. The network learned to recognize patterns in the movement of the fingertip, allowing it to identify the written characters accurately.

CNN (Convolutional Neural Network)

CNNs are typically used in image recognition, but in this case, we adapted a depth-wise CNN to process the 3D trajectory data. By using separable convolution layers, we were able to break down the trajectory into individual movements and pool them together for recognition. The CNN was particularly effective in identifying distinct patterns and shapes in the writing, providing a complementary approach to the LSTM.

Both models were tested on two datasets: a self-collected dataset of 21,000 trajectories (RTD) and a publicly available dataset known as the 6D Motion Gesture (6DMG) dataset, which includes alphanumeric air-writing data.

Results

The results of this research were groundbreaking in the field of air-writing recognition. Using the RTD dataset, the LSTM model achieved a recognition accuracy of 99.17%, and the CNN model achieved 99.06%. On the 6DMG dataset, the LSTM outperformed previous methods with an accuracy of 99.32%, while the CNN model achieved 99.26%. These results mark a significant improvement over prior studies, where similar models often struggled with lower recognition rates.

One of the key reasons for this high accuracy was the use of normalization techniques. By normalizing the 3D trajectory data before feeding it into the deep learning models, we were able to reduce noise and account for variability in writing styles. In particular, the combination of nearest neighbor and root point translation normalization techniques proved to be highly effective in improving recognition accuracy.

Additionally, the large dataset we created—21,000 trajectories covering all digits—addressed a common problem in air-writing research: the lack of comprehensive datasets. By making this dataset publicly available, we hope to contribute to future research and encourage further advancements in the field.

Technology Stack and Implementation

The development and testing of this system required a range of technologies and tools. The Intel RealSense SR300 camera was used to capture the trajectory data. This depth camera is widely used for gesture detection and offers millimeter-level accuracy, making it ideal for air-writing recognition.

Once the data was collected, we used a custom-built user interface (UI) to manage the data. This UI allowed participants to write digits in front of the camera, which were then displayed on a virtual screen. The system captured the x, y, and z coordinates of the fingertip, translating these movements into a spatial trajectory.

For processing and training the neural networks, we used Python and the Keras library, with TensorFlow as the backend. The models were trained on an NVIDIA GeForce GTX 1050 Ti GPU to speed up the training process. Various optimizations, such as the Adam optimizer and a dropout rate of 0.5, were used to improve the models’ efficiency and prevent overfitting.

Key Takeaways

Touchless Interaction for Diverse Applications: One of the biggest advantages of air-writing recognition is the potential for touchless interaction. This system can be applied in various fields, including AR, VR, healthcare (where touchless control is critical), and smart home devices. The technology provides a more intuitive, efficient, and hygienic way of interacting with devices.
High Accuracy with Deep Learning: The combination of LSTM and CNN models provided an excellent solution for recognizing 3D trajectories in free space. The LSTM, in particular, was effective in handling the sequential nature of air-writing, while the CNN excelled in identifying spatial patterns in the trajectory data.
Normalization Is Key to Success: One of the critical challenges in air-writing recognition is the inherent variability in how people write in free space. Normalization techniques like nearest neighbor and root point translation were essential in smoothing the trajectory data, reducing noise, and improving overall accuracy.
Dataset Contribution: By publishing a large, publicly available dataset, we are contributing to the broader research community. This dataset will help other researchers test their models and improve upon the work done in air-writing recognition, moving the field forward.
Future Directions: While this research focused on digit recognition, the next step is to expand the system to recognize words and sentences written in free space. Another potential area for future research is integrating this system with AR/VR environments or developing real-time applications for industries like healthcare or education.

Conclusion

This research represents a significant advancement in the field of air-writing recognition. By developing a robust system based on deep learning models and using depth sensors to capture 3D trajectory data, we achieved unparalleled accuracy in recognizing air-written digits. With further development, this technology holds the potential to revolutionize human-computer interaction in a variety of industries, providing an intuitive, touchless way to communicate with machines.

Reference

Please find my research here – https://doi.org/10.3390/s20020376

and here – https://doi.org/10.1109/CRC.2019.00026

Cite this Research

@Article{s20020376,
AUTHOR = {Alam, Md. Shahinur and Kwon, Ki-Chul and Alam, Md. Ashraful and Abbass, Mohammed Y. and Imtiaz, Shariar Md and Kim, Nam},
TITLE = {Trajectory-Based Air-Writing Recognition Using Deep Neural Network and Depth Sensor},
JOURNAL = {Sensors},
VOLUME = {20},
YEAR = {2020},
NUMBER = {2},
ARTICLE-NUMBER = {376},
URL = {https://www.mdpi.com/1424-8220/20/2/376},
PubMedID = {31936546},
ISSN = {1424-8220},
ABSTRACT = {Trajectory-based writing system refers to writing a linguistic character or word in free space by moving a finger, marker, or handheld device. It is widely applicable where traditional pen-up and pen-down writing systems are troublesome. Due to the simple writing style, it has a great advantage over the gesture-based system. However, it is a challenging task because of the non-uniform characters and different writing styles. In this research, we developed an air-writing recognition system using three-dimensional (3D) trajectories collected by a depth camera that tracks the fingertip. For better feature selection, the nearest neighbor and root point translation was used to normalize the trajectory. We employed the long short-term memory (LSTM) and a convolutional neural network (CNN) as a recognizer. The model was tested and verified by the self-collected dataset. To evaluate the robustness of our model, we also employed the 6D motion gesture (6DMG) alphanumeric character dataset and achieved 99.32% accuracy which is the highest to date. Hence, it verifies that the proposed model is invariant for digits and characters. Moreover, we publish a dataset containing 21,000 digits; which solves the lack of dataset in the current research.},
DOI = {10.3390/s20020376}
}

and

@INPROCEEDINGS{9058868,
  author={Alam, Md. Shahinur and Kwon, Ki-Chul and Kim, Nam},
  booktitle={2019 4th International Conference on Control, Robotics and Cybernetics (CRC)}, 
  title={Trajectory-Based Air-Writing Character Recognition Using Convolutional Neural Network}, 
  year={2019},
  volume={},
  number={},
  pages={86-90},
  keywords={Cameras;Hidden Markov models;Writing;Trajectory;Character recognition;Atmospheric modeling;Tracking;CNN;character recognition;gesture recognition;human-computer interaction},
  doi={10.1109/CRC.2019.00026}}