Skip to main content
  • Research note
  • Open access
  • Published:

Can pre-trained convolutional neural networks be directly used as a feature extractor for video-based neonatal sleep and wake classification?



In this paper, we propose to evaluate the use of pre-trained convolutional neural networks (CNNs) as a features extractor followed by the Principal Component Analysis (PCA) to find the best discriminant features to perform classification using support vector machine (SVM) algorithm for neonatal sleep and wake states using Fluke® facial video frames. Using pre-trained CNNs as a feature extractor would hugely reduce the effort of collecting new neonatal data for training a neural network which could be computationally expensive. The features are extracted after fully connected layers (FCL’s), where we compare several pre-trained CNNs, e.g., VGG16, VGG19, InceptionV3, GoogLeNet, ResNet, and AlexNet.


From around 2-h Fluke® video recording of seven neonates, we achieved a modest classification performance with an accuracy, sensitivity, and specificity of 65.3%, 69.8%, 61.0%, respectively with AlexNet using Fluke® (RGB) video frames. This indicates that using a pre-trained model as a feature extractor could not fully suffice for highly reliable sleep and wake classification in neonates. Therefore, in future work a dedicated neural network trained on neonatal data or a transfer learning approach is required.


Sleep is an essential behavior for the development of the nervous system in neonates [1,2,3]. Normally, newborn babies sleep between 16 and 18 h per day. Continuous sleep tracking and assessment could potentially provide an indicator of brain development over time [4, 5]. To achieve this, automatic sleep and wake analysis is required, which can offer valuable information on a neonate’s mental and physical growth, not only for healthcare professionals but also for parents [6].

Currently, Video electroencephalogram (VEEG) is considered as a gold standard for neonatal sleep monitoring, which requires a number of sensors and electrodes attached to a neonate’s skin to collect multiple-channel EEG signals [7,8,9]. In addition, the use of VEEG is labor-intensive, where the human effort on annotating sleep states is required [10]. Therefore, one would demand a contact-free sleep monitoring system for neonates. In recent years, unobtrusive or contact-free approaches have gained a lot of attention for sleep monitoring [11,12,13,14,15,16]. All these methods are more successful in adults [17, 18]. In contrast, video-based methods appear to be a promising approach since it is more comfortable and convenient to use both in the home or in the hospitals [19, 20]. With the advancements in deep learning algorithms and clinical research on neonatal facial patterns [21, 22], a new, unobtrusive approach of monitoring sleep patterns has been proposed [23, 24]. However, evaluation of the deep learning models demands big database to train the prediction model.

The main contributions of this work include: (a) extracting features from well-known CNNs, e.g., VGG-16, VGG-19, InceptionV3, GoogLeNet, ResNet, and AlexNet, (b) comparing different color palette (amber, high contrast, red-blue, hot metal, and grayscale) from RGB and thermal video frames, and (c) evaluating the extracted features using Principal Component Analysis (PCA) followed by Support Vector Machine (SVM) to classify neonatal sleep and wake classification. As this was an explorative study, to evaluate the feasibility of a pre-trained model as a feature extractor to classify neonates’ sleep and wake states using video frames, we started with a small pilot study population of neonate’s video frames data by adopting a robust and less computational complex approach to classify sleep states.

Main text

Subject database

Video and VEEG data from seven neonates were collected retrospectively by a pediatrician at the Children’s Hospital affiliated to Fudan University, Shanghai, China [25]. The detailed descriptions of the demographics and physical conditions of the neonates are shown in Additional file 1: Table S1. Annotation of sleep and wake states was performed by a professional neurologist on each 30-sec VEEG epoch and video frames, respectively, according to the American Academy of Sleep Medicine (AASM) [26].

Intensity-based detection

To enable identifying sleep and wake states for neonates using video frames, it is required to have precise face detection in Fluke® video [27]. Detail description of Intensity-based detection has been discussed in our previous paper [25]. Figure 1 shows the input video frame, and the neonatal facial region is detected using an intensity-based method. After that, the detected RGB facial region is mapped on other color palettes (thermal) of the video frames to extract the facial region.

Fig. 1
figure 1

Neonatal face detection using the intensity-based detection method

Pre-trained CNN models

Our proposed method is to classify neonatal sleep and wake states using pre-trained CNNs. Usually, initial layers of CNNs capture basic input image features like spots, boundaries, and colors pattern that are inattentive by the deeper hidden layers to form complex higher-level feature patterns to present a better-off image illustration [28]. Each layer of the CNNs output acts as an activation unit for the input images. Literature studies reveal that while using pre-trained CNNs for feature extraction, the features are usually extracted from the fully-connected layers (FCL’s) right before the final output classification layer [29, 30]. Considering this motivation, we extracted the features from from the FCL’s of a pre-trained network. The detailed descriptions of all the pre-trained models is mentioned in Additional file 2: Table S2. In the following, we briefly introduce the existing pre-trained models as well as PCA and SVM used in our work.

VGG16 and VGG19 Model VGG model [31] contains a stack of convolutional layers followed by three FCL’s. In this work, we used both pre-trained VGG16 and VGG19 models, and features were extracted from the last three FCLs.

AlexNet architecture The architecture of AlexNet [32] that contains a total of eight layers. In this work, we extracted features from the last two FCL’s of the pre-trained AlexNet.

ResNet-18 The baseline structure of the residual network (ResNet) [33] is the same as the other CNNs, except that a shortcut link is added to each pair of 3 × 3 filters. To classify a neonate's sleep and wake states, we extract 1000 features from the last FCL of the pre-trained ResNet-18 model.

GoogLeNet GoogLeNet [34] has unique features that help them to achieve state-of-the-art results and outperform other previous networks, e.g., 1 × 1 convolution is used as a dimension reduction to reduce computation usage. In this work, we have used the pre-trained GoogLeNet network, and features are extracted from the last “FC1000” layer.

InceptionV3 Inception-V3 is the factorization idea in the third iteration of GoogLeNet [35]. The last FCL is used to extract the features from the pre-trained Inception–V3 model to perform neonatal sleep and wake classification.

Principal component analysis (PCA) PCA is a method to differentiate the discriminant features in the dataset by suppressing variations [36]. In this paper, once the features are extracted from FCL’s of CNNs, we input these features to PCA to find the best-discriminated features, to help SVM to classify neonates sleep and wake states at the next stage.

Support vector machine (SVM) Based on features extracted from the pre-trained CNNs, we employed an SVM classifier to classify neonatal sleep and wake states [37, 38]. We have used the “classificationLearner“ app in Matlab R2018b with the SVM default setting (kernel function = ‘linear,’ box constraint = 1) to perform the classification.

Results and discussion

Twenty-two experiments were conducted on RGB and thermal videos, respectively. For evaluation purposes, all the results are expressed in terms of sensitivity(Se), specificity(Sp), precision(p), and accuracy(Ac), obtained using five-fold cross-validation. The results are validated with the VEEG annotations.

Table 1 shows the sleep and wake classification results obtained by the SVM classifier after feature extraction using different pre-trained CNNs. We observed that the overall performance of using FCL6-7-8 in VGG-16 and VGG19, FCL8 in AlexNet, and FCL in inceptionV3, ResNet-18, and GoogLeNet was low when used to classify neonatal sleep and wake states. Multifarious statistical results obtained via SVM to classify neonatal sleep and wake states show a disproportionate pattern. However, RGB-InceptionV3 (FCL) shows the best values for Se is 97.4%, but Ac drops to 55.1%, and similarly RGB-VGG16 (FCL8) and RGB-VGG19 (FCL8) shows Se of 90.0%, but overall accuracy is drops to 66.2% and 65.2% respectively. However, features extracted from AlexNet (FCL7) trained via SVM shows the best optimal results with an Ac of 65.3%, Se 69.8%, and Sp of 61.0% to classify neonatal sleep and wake states. In contract to the other features extracted values from pre-trained networks, features extracted from AlexNet (FCL7) contains discriminant features that assist SVM to classify neonate’s sleep and wake stage. One of the main reason to achieved higher statistical results using pre-trained AlexNet is that as pre-trained, AlexNet was originally trained on just over a million images as compared to other CNNs that were trained on more the 15 million images, depicting more complex features architecture values at different FCL’s [31, 39]. It is observed that in AlexNet, the first layer has a filter of size 11 × 11, and the second layer has a 5x5 filter, and so on, there is no standard about filter sizes and max pooling. The convolutions for each layer are decided purely experimentally. In contrary to that, other CNNs have standard protocol such as in VGG-Net, all the convolution kernels are of size 3x3, and max-pooling is done after 2 or 3 layers of convolutions. GoogLeNet works on a parallel combination of 1x1, 3x3, and 5x5 convolutional filters. The overall complex nature of pre-trained CNNs distinguished AlexNet to obtain better performance to classify neonatal sleep and wake states. Figure 2a shows the standard deviation (STD) of all the sleep and wake features extracted from AlexNet FCL7. It is observed that most of the sleep and wake extracted features from FCL7 are lies almost in the same region. However, AlexNet shows slightly better performance than other extracted features using SVM; one of the main reasons is that the corresponding trained features are quite separated from each other. Figure 2b depicts the STD of discriminant corresponding features extracted after PCA from pre-trained AlexNet (FCL7). These discriminant AlexNet (FCL7) features help to achieve better neonatal sleep and wake classification accuracy as compared to other pre-trained CNNs.

Table 1 Neonatal sleep and wake classification results (five-fold cross-validation) using different pre-trained CNNs combined with an SVM classifier
Fig. 2
figure 2

a STD of all features extracted from the pre-trained AlexNet (FCL7). b STD of discriminant features after PCA extracted from the pre-trained AlexNet (FCL7). The central redline (inside the blue boxes) indicates the median, and the bottom and top edges of the blue-box indicate the 25th and 75th percentiles of data points, respectively. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘ + ’ symbol

As proof of study, we have analyzed other neonatal facial color palettes extracted from Fluke® SmartView. Additional file 3: Table S3 shows the statistical results achieved using multiple color palettes such as amber, high contrast, red-blue, hot metal, and grayscale. In contrast to the results shown in Table 1 Fluke® multiple color palettes depict disproportionate results such as high contrast-AlexNet (FCL8), InceptionV3-Hot-metal (FCL), GoogLeNet-Grayscale(FCL), and VGG19-Red-Blue (FCL6) achieved the best values for Se are 84.8%, 76.3%, 73.0%, and 81.1% respectively. Similarly, VGG-19-Amber (FCL) shows the best values for Sp is 87.8%. However, overall, Ac obtained from these color palates are quite low; VGG19-High Contrast (FCL7) shows the best Ac of 65.6%. One of the main reason is that the range of these Fluke® color palettes are quite narrow, as shown in Additional file 4: Figure S1.

In general, the statistical results of the pre-trained CNNs model as a feature extractor to classify neonatal sleep and wake states are quite modest [20]. One of the main reasons for attaining such modest accuracy is that all the existing pre-trained CNNs network were trained on natural images such as animals, flowers, sceneries, and automobiles, etc. The feature patterns of pre-trained CNNs networks classes are quite different from our neonate’s database, that makes it difficult for existing CNNs to classify neonate’s sleep and wake states [40, 41]. The motivation for using pre-trained CNNs as feature extraction is that it doesn’t demand a lot of computational capacity, and it is quite robust as we do not need to retain the network; these attributes compel us to start with feature extraction approach to classifying neonatal sleep and wake states. However, experimental analysis depicts that this approach doesn’t offer the promising results to act as an aided tool for clinicians to classify neonates’ sleep and wake states unobtrusively. Nevertheless, as there are no such studies has been reported in the literature by analyzing the neonatal facial videos to classify sleep and wake states using CNNs as feature extractor. This research could be helpful for future studies to adopt other techniques (e.g., transfer learning or dedicated CNNs) to classify neonatal sleep and wake states using video frames to achieve better accuracy.


This work experimentally verified the achievability of unobtrusive neonatal sleep and wake states via automatic classification using a video frames from fluke® camera. Five-fold cross-validation depicts the modest accuracy of 65.3% from pre-trained AlexNet at FCL7, compared with VEEG annotated data by a neurologist for sleep and wake states. In the future, the transfer learning approach/dedicated CNNs and more datasets collection with different ethnic groups will be the next step of our research work.

Limitations of the study

It is also important to note that this is a preliminary study, where video data collection took place in a controlled environment with fixed camera placement, stable lighting conditions, and under the supervision of nurses and pediatricians. Furthermore, as for this proof-of-point study, we analyzed the variations in neonatal facial pattern no clear sleep-related issues (albeit with various reasons of hospital admission). The accuracy concerning to those with sleep syndromes have remained unclear. This article focus only on two-state (sleep and wake) classification, and the dedicated design of deep learning architecture to classify neonatal sleep stages is on the foremost next step of this research work.

Availability of data and materials

At this point the data used in this paper cannot be shared or released to a third party.



Fully-connected layers


Video electroencephalogram


Convolutional neural networks


Principal Component Analysis


Support vector machine


  1. S. M. Ludington-Hoe, M. W. Johnson, K. Morgan, T. Lewis, J. Gutman, P. D. Wilson, and M. S. Scher, “Neurophysiologic assessment of neonatal sleep organization: Preliminary results of a randomized, controlled trial of skin contact with preterm infants,” Pediatrics, vol. 117, no. 5, 2006.

  2. M. M, Insomnia in the elderly. J Clin Psychiatry, vol. 53, 1992.

  3. D. W. Roffwarg HP, Muzio JN, “Ontogenetic development of the human sleep-dream cycle,” Science (80-.)., vol. 152, no. 3722, p. 604‐619, 1966.

  4. J. P. Shaffery. Sleep and Brain Development,” in Handbook of Behavioral Neuroscience, vol. 30, 2019, pp. 413–424.

  5. Bayer JK, Hiscock H, Hampton A, Wake M. Sleep problems in young infants and maternal mental and physical health. J Paediatr Child Health. 2007;43(1–2):66–73.

    Article  PubMed  Google Scholar 

  6. H. L. Ball, Reasons to bed-share: Why parents sleep with their infants, vol. 20, no. 4. 2002.

  7. Grigg-Damberger M, Gozal D, Marcus CL, et al. The visual scoring of sleep and arousal in infants and children. J Clin Sleep Med. 2007;3(2):201–40.

    Article  PubMed  Google Scholar 

  8. Atallah L, Serteyn A, Meftah M, et al. Unobtrusive ECG monitoring in the NICU using a capacitive sensing array. Physiol Meas. 2014;35(5):895–913.

    Article  CAS  PubMed  Google Scholar 

  9. Gruetzmann A, Hansen S, Müller J. Novel dry electrodes for ECG monitoring. Physiol Meas. 2007;28(11):1375–90.

    Article  Google Scholar 

  10. Sadeh A, Lavie P, Scher A, Tirosh E, Epstein R. Actigraphic home-monitoring sleep-disturbed and control infants and young children: A new method for pediatric assessment of sleep-wake patterns. Pediatrics. 1991;87(4):494–9.

    CAS  PubMed  Google Scholar 

  11. M. A. Lopez-Gordo, D. Sanchez Morillo, and F. Pelayo Valle, “Dry EEG electrodes,” Sensors, vol. 14, no. 7, pp. 12847–12870, 2014.

  12. Ruffini G, Dunne S, Fuentemilla L, Grau C, Farrés E, Marco-Pallarés J, Watts PCP, Silva SRP. First human trials of a dry electrophysiology sensor using a carbon nanotube array interface. Sensors Actuators. 2008;144(2):275–9.

    Article  CAS  Google Scholar 

  13. Feng W. Development of a PVDF Piezopolymer Sensor for Unconstrained In-sleep Cardiorespiratory Monitoring. J Intell Mater Syst Struct. pp. 1–7, 2003.

  14. SC. Brink M, Müller CH. Contact-free measurement of heart rate, respiration rate, and body movements during sleep. Behav Res Methods. vol. 38, no. 3, pp. 511–521, 2006.

  15. R. Cardiac and M. Using, Remote cardiac monitoring using radar. Massachusetts Institute of Technology, 2009.

  16. M. Sekine and K. Maeno. Non-contact heart rate detection using periodic variation in Doppler frequency. IEEE Sensors Appl. Symp. Proc. pp. 318–322, 2011.

  17. Nukaya S, Sugie M, Kurihara Y. A noninvasive heartbeat, respiration, and body movement monitoring system for neonates. Artif Life Robot. 2014;19:414–9.

    Article  Google Scholar 

  18. Werth J, Atallah L, Andriessen P, Long X, Zwartkruis-Pelgrim E, Aarts RM. Unobtrusive sleep state measurements in preterm infants—A review. Sleep Med Rev. 2017;32:109–22.

    Article  Google Scholar 

  19. Meltzer LJ, Montgomery-Downs HE, Insana SP, Walsh CM. Use of actigraphy for assessment in pediatric sleep research. Sleep Med Rev. 2012;16(5):463–75.

    Article  PubMed  Google Scholar 

  20. Long X, Otte R, Sanden EV, Werth J, Tan T. Video-based actigraphy for monitoring wake and sleep in healthy infants: A Laboratory Study. Sensors. 2019;19(5):1075.

    Article  Google Scholar 

  21. G. Zamzmi, R. Kasturi, D. Goldgof, R. Zhi, T. Ashmeade, and Y. Sun. A Review of Automated Pain Assessment in Infants: Features, Classification Tasks, and Databases. IEEE Rev Biomed Eng. vol. 11, no. c, pp. 77–96, 2018.

  22. X. Lu, X. Duan, X. Mao, Y. Li, and X. Zhang. Feature Extraction and Fusion Using Deep Convolutional Neural Networks for Face Detection. Math. Probl. Eng. vol. 2017, 2017.

  23. A. Heinrich, X. Aubert, and G. De Haan, Body movement analysis during sleep based on video motion estimation. IEEE 15th Int. Conf. e-Health Networking, Appl. Serv. Heal. 2013, no. Healthcom, pp. 539–543, 2013.

  24. Y. Zhang, Y. Chen, L. Hu, X. Jiang, and J. Shen. An effective deep learning approach for unobtrusive sleep stage detection using microphone sensor. Proc. Int. Conf. Tools with Artif. Intell. ICTAI, vol. 2017-Nov, pp. 37–44, 2018.

  25. Awais M, Chen C, Long X, Yin B, Nawaz A, Abbasi SF, Akbarzadeh S, Tao L, Lu C, Wang L, Aarts RM, Chen W. Novel framework: face feature selection algorithm for neonatal facial and related attributes recognition. IEEE Access. 2020;8:59100–13.

    Article  Google Scholar 

  26. MM. Grigg-damberger. The Visual Scoring of Sleep in Infants 0 to 2 Months of Age. J Clin Sleep Med. vol. 12, no. 3, 2016.

  27. F. TiX580, “Expert Series Thermal Imagers,” Fluke Corp, 2016.

  28. MD Zeiler, R Fergus. Visualizing and understanding convolutional networks. Comput Vis Pattern Recognit. pp. 818–833, 2014.

  29. S. Rajaraman, SK Antani, M Poostchi, K Silamut, A Hossain, RJ Maude, S Jaeger, GR Thoma. Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images. PeerJ. pp. 1–17, 2018.

  30. AS Razavian, H Azizpour, J Sullivan, S Carlsson. CNN features off-the-shelf: An astounding baseline for recognition. IEEE Comput Soc Conf Comput Vis. Pattern Recognit. Work. pp. 512–519, 2014.

  31. K Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition, 3rd Int. Conf. Learn. Represent. ICLR 2015. Conf. Track Proc., pp. 1–14, 2015.

  32. Krizhevsky A, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. NIPS’12 Proc. 25th Int Conf Neural Inf Process Syst. vol. 1, pp. 1097–1105, 2012.

  33. K He , J Sun. Deep Residual Learning for Image Recognition. Comput Vis Pattern Recognit. pp. 1–9, 2016.

  34. C. Szegedy, S. Reed, P. Sermanet, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. Comput Vis Pattern Recognit. pp. 1–12, 2014.

  35. C. Szegedy, V. Vanhoucke, J. Shlens. Rethinking the Inception Architecture for Computer Vision. Comput Vis Found. 2014.

  36. H. Abdi, L. J. Williams. Principal component analysis. Wiley Interdisplinary Rev. Comput. Stat. pp. 1–47. 2010.

  37. T Evgeniou, M Pontil. Support Vector Machines : Theory and Applications. Mach. Learn. Its Appl., 1999.

  38. Jun Q. An SVM face recognition method based on Gabor-featured key points. Int Conf Mach Learn Cybern Guangzhou, China. vol. 8, pp. 5144–5149, 2005.

  39. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L. ImageNet large scale visual recognition challenge. Int J Comput Vis. 2015;115(3):211–52.

    Article  Google Scholar 

  40. L Hulstaert. Transfer Learning: Leverage Insights from Big Data. 2018. [Online]. [Accessed: 17-Jun-2020].

  41. Olga R, Jia D, Hao S, Jonathan K, Sanjeev S, Sean M, Zhiheng H, Andrej K, Aditya K, Michael B. ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis. 2015.

Download references


The authors would like to thank pediatrician nurses at Children’s Hospital affiliated to Fudan University for their insightful inspirations and discussions.


This work is supported by National Key R&D Program of China (Grant No. 2017YFE0112000), Shanghai Municipal Science and Technology Major Project (Grant No. 2017SHZDZX01), and China Postdoctoral Science Foundation Grant (Grant No. 2018T110346 and No. 2018M632019).

Author information

Authors and Affiliations



Conceptualization, approach, software, justification, formal analysis, imagining, script—inventive manuscript preparation, MA; examination, resources, database curation, writing—inventive manuscript preparation, review and editing, CC and CW; writing—inventive manuscript preparation, review and editing, XL; writing—inventive manuscript preparation, review and editing, BY; VEEG data collection and subject management, CL; data annotation, XW, clinical database and subject health condition, LW. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Chunmei Lu or Wei Chen.

Ethics declarations

Ethics approval and consent to participate

The study protocol was designed according to the hospital’s clinical study regulation and approved by the Internal Ethics Committee for Neonatal experiments of Children’s Hospital of Fudan University, Shanghai, China. All participants were consent to participate in the data collection process. Written informed consent was obtained from the involve for publication of this research article and any accompanying images and videos. A copy of the written consent is available for review by the Editor of this journal.

Consent for publication

Written informed consent to publishing is obtained from parents of all the subjects by the pediatrician at Children’s Hospital of Fudan University, Shanghai, China.

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Table S1.

Neonatal body condition before the collection of video and VEEG data, The detailed descriptions of the demographics and physical conditions of neonates.

Additional file 2: Table S2.

Overall ConvNet’s architecture, The details descriptions of all the pre-trained model has been mentioned.

Additional file 3: Table S3.

Neonatal sleep and wake states classification results using Fluke® multiple colors palattes, statistical results achieved using multiple color palettes such as amber, high contrast, red-blue, hot metal, and grayscale.

Additional file 4: Figure S1.

Fluke® color palettes range.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Awais, M., Long, X., Yin, B. et al. Can pre-trained convolutional neural networks be directly used as a feature extractor for video-based neonatal sleep and wake classification?. BMC Res Notes 13, 507 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: