Facial Video-Based Remote Eye Movement Detection using Electrooculography and Deep Neural Network

Poster No:

726 

Submission Type:

Abstract Submission 

Authors:

Jin Soo Jeon1, Jong-Hwan Lee1

Institutions:

1Department of Brain and Cognitive Engineering, Korea University, Seoul, Korea, Republic of

First Author:

Jin Soo Jeon  
Department of Brain and Cognitive Engineering, Korea University
Seoul, Korea, Republic of

Co-Author:

Jong-Hwan Lee  
Department of Brain and Cognitive Engineering, Korea University
Seoul, Korea, Republic of

Introduction:

Blink duration and rate have been widely recognized as consistent and reliable parameters for assessing drowsiness and arousal [1]. Electrooculography (EOG) has precisely measured eye movement features, such as eyeblink rate, closing phase, opening phase, and blink duration [2]. Despite the established validity and reliability of the EOG signal in capturing eye movements, the necessity of electrode attachment near eyeballs for EOG measurement significantly limits its utility. Consequently, our study addresses this limitation by leveraging facial videos and deep neural network (DNN).

Methods:

We employed the facial video and EOG data available in the DEAP dataset [3], a widely recognized multimodal dataset for human emotion analysis. To ensure synchronization with the video recordings at 50 frames per second, we downsampled the EOG signal to 50Hz. Subsequently, we re-referenced the vertical EOG signal to its corresponding electrode and applied bandpass filtering in the range 0.01–25Hz. Epochs were extracted from –5s to 60s with baseline removal using –5s to 0s signals. We utilized the discrete wavelet transform method [4] to mitigate baseline drift and further reduced residual noise via a moving median filter [5].

For blink peak detection from the preprocessed EOG, we implemented min-max scaling within subjects and employed a local maxima algorithm [4]. Blink phase detection, pinpointing the precise time points for initial closure, complete closure, and reopening of eyes were obtained using the derivative value of the vertical EOG signal [6]. We categorized target labels for all video frames into three distinct states of eye movements (i.e., normal, closing, and opening) based on the blink phase detection from EOG. A Pearson's correlation analysis using the identified EOG labels and self-reports from subjects was conducted to evaluate the validity of the extracted labels.

We trained a DNN model using recorded facial videos for eye movement detection. Our DNN model consists of (a) an Inception-ResNetV1 module to extract hierarchical visual features from each video frame, (b) a Long Short-Term Memory (LSTM) module to analyze the sequence of video frames, and (c) fully connected layers for the classification of the target labels (Fig. 2a). The performance of the model was evaluated on both the DEAP dataset and the DROZY database [7] based on the five-folds cross-validation (CV). The face region was detected and cropped by the OpenCV face detection model [8]. Six consecutive video frames were used as DNN input and model parameters were optimized using a random search method [9].
Supporting Image: fig1_eog_processing.png
 

Results:

Our adopted EOG preprocessing pipeline effectively denoised EOG signals while maintaining important temporal features of eye movements (Fig. 1a-d). Among the 16 subjects in the DEAP dataset with valid EOG signals, seven showed a meaningful negative correlation between blink duration and arousal (Fig. 1e). From the DROZY Database, data from all 14 subjects revealed a substantially high correlation of 0.59 (p < 0.01) between blink duration and Karolinska Sleepiness Scale Score.

Our trained DNN yielded a 98.2% average test accuracy across the 5-fold CV using the DEAP dataset. The proposed DNN effectively predicted eye movements in facial videos, demonstrating that the average duration error, compared to the EOG label, was approximately 60ms (3 frames) with a minimal delay of 40ms (2 frames) (Fig. 2c).
Supporting Image: fig2_eye_movement_from_video.png
 

Conclusions:

We developed a DNN model to extract vertical eye blink movements from facial videos robustly. Notably, our facial video-based blink detection model often outperformed the EOG label in predicting the observable eye movement. This would be because the EOG, which captures electric signals originating from inner muscle movement, may inherently introduce minor differences from overt eye movements. The estimated eye blink movements can potentially estimate affective states and arousal, including drowsiness, remotely using a camera without EOG electrodes.

Emotion, Motivation and Social Neuroscience:

Emotion and Motivation Other 1

Modeling and Analysis Methods:

Classification and Predictive Modeling
Methods Development
Multivariate Approaches
Other Methods 2

Keywords:

Computing
Emotions
Machine Learning
Modeling
Other - Deep neural networks; Electrooculography; Eye movement detection; Facial video

1|2Indicates the priority used for review

Provide references using author date format

[1]Damousis, I., Cester, I., Nikolaou, S., & Tzovaras, D. (2007). Physiological indicators based sleep onset prediction for the avoidance of driving accidents. 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 6699–6704.
[2]Cori, J. M., Anderson, C., Shekari Soleimanloo, S., Jackson, M. L., & Howard, M. E. (2019). Narrative review: Do spontaneous eye blink parameters provide a useful assessment of state drowsiness? Sleep Medicine Reviews, 45, 95–104.
[3]Koelstra S, Muhl C, Soleymani M, Lee J-S, Yazdani A, Ebrahimi T, Pun T, Nijholt A and Patras I (2011) Deap: A database for emotion analysis; using physiological signals IEEE Trans. Affect. Comput. 3 18–31
[4]Ebrahim, P. (2016). Driver drowsiness monitoring using eye movement features derived from electrooculography
[5]Bulling, A., Ward, J. A., Gellersen, H., & Tröster, G. (2010). Eye movement analysis for activity recognition using electrooculography. IEEE transactions on pattern analysis and machine intelligence, 33(4), 741-753.
[6]Abbas, S. N., & Abo-Zahhad, M. (2017). Eye Blinking EOG Signals as Biometrics. In R. Jiang, S. Al-maadeed, A. Bouridane, Prof. D. Crookes, & A. Beghdadi (Editor), Biometric Security and Privacy: Opportunities & Challenges in The Big Data Era (pp 121–140). Springer International Publishing.
[7]Massoz, Q., Langohr, T., François, C., & Verly, J. G. (2016, March). The ULg multimodality drowsiness database (called DROZY) and examples of use. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1-7).
[8]Khan, M., Chakraborty, S., Astya, R., & Khepra, S. (2019, October). Face detection and recognition using OpenCV. In 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) (pp. 116-119). IEEE.
[9]Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of machine learning research, 13(2).

Acknowledgment: This work was supported by the National Research Foundation (NRF) grant funded by the Korea government (MSIT) (NRF-2021M3E5D2A01022515, No. RS-2023-00218987), and in part by the Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government [23ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System].