Hynek Boril's Homepage

Hynek Boril, Ph.D.

Selected Publications

Journal Articles

Hansen, J. H. L., Boril, H. (2018): “On the issues of intra-speaker variability and realism in speech, speaker, and language recognition tasks,” Speech Communication, Elsevier, 101, July, 94-108. [pdf] [bib]

Ghaffarzadegan, S., Boril, H., Hansen, J. H. L. (2017): “Deep Neural Network Training for Whispered Speech Recognition Using Small Databases and Generative Model Sampling,” International Journal of Speech Technology, Springer, 20(4), Dec., 1063-1075. [pdf] [bib]

Ghaffarzadegan, S., Boril, H., Hansen, J. H. L. (2016): “Generative Modeling of Pseudo-Whisper for Robust Whispered Speech Recognition,” IEEE Transactions on Audio, Speech, and Language Processing, 24(10), Oct., 1705-1720. [pdf] [bib]

Hansen, J. H. L., Williams, K., Boril, H. (2015): “Speaker height estimation from speech: fusing spectral regression and statistical acoustic models,”Journal of the Acoustical Society of America (JASA), 138(2), Aug., 1052-1067. [pdf] [cited] [bib]

Amuda, S., Boril, H., Sangwan, A., Hansen, J. H. L., Ibiyemi, T. S. (2014): “Engineering Analysis and Recognition of Nigerian English: An Onsight into Low Resource Languages,” Transactions on Machine Learning and Artificial Intelligence, 2(4), Aug., 115-126. [pdf] [bib]

Hasan, T., Boril, H., Sangwan, A., Hansen, J. H. L. (2013): “Multi-Modal Highlight Generation for Sports Videos using an Information-Theoretic Excitability Measure,” EURASIP Journal on Advances in Signal Processing, 2013:173. [pdf] [cited] [bib]

Hansen, J. H. L., Ruzanski, E., Boril, H., Meyerhoff, J. (2012): “TEO-Based Speaker Stress Assessment Using Hybrid Classification and Tracking Schemes,” International Journal of Speech Technology, Springer, June 2012, DOI 10.1007/s10772-012-9165-1. [pdf] [cited] [bib]

Boril, H., Hansen, J. H. L. (2010). “Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments,” IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1379-1393. [pdf] [cited] [bib]

Boril, H. and Fousek, P. (2006). “Influence of Different Speech Representations and HMM Training Strategies on ASR Performance,” Acta Polytechnica, Journal on Advanced Engineering 46, 32-35. [cited] [pdf] [bib]

Book Reviews in Journals

Boril, H. (2011): “Pavel Machac and Radek Skarnitzl (2009). Foneticka segmentace hlasek. Prague: Epocha Publishing House” Nase Rec (Our Speech), The Institute of Czech Language, Academy of Sciences of the Czech Republic, in Czech, Prague.

Boril, H. (2010): “Pavel Machac and Radek Skarnitzl (2009). Principles of Phonetic Segmentation. Prague: Epocha Publishing House” in: R. Skarnitzl (Ed.), Acta Universitatis Carolinae (AUC) Philologica 1/2009, Phonetica Pragensia XII, Karolinum Publishing House, Prague, pp. 63-64. [pdf] [bib]

Book Chapters

Boril, H., Boyraz, P., Hansen, J. H. L. (2012): Digital Signal Processing for In-Vehicle Systems and Safety, chapter “Towards Multimodal Driver's Stress Detection,” J. H. L. Hansen, P. Boyraz, K. Takeda, H. Abut (Eds.), Springer, New York, pp. 3-19. [pdf] [cited] [bib]

Conference/Workshop Proceedings

Boril, H., Horn, S. (2022). “GAN-Based Augmentation for Gender Classification from Speech Spectrograms,” in Proc. of International Conference on Electrical, Computer and Energy Technologies (IEEE ICECET 2022), 6 pages, July 20-22, (Prague, Czech Republic). [bib] [pdf]

Hansen, J. H. L., Boril, H. (2021). “Gunshot Detection Systems: Methods, Challenges, and Can they be Trusted?,” in Proc. of 151st Audio Engineering (AES) Convention, 10 pages, October 11-13, (Las Vegas, NV). [bib] [pdf]

Hansen, J. H. L., Boril, H. (2016). “Robustness in Speech, Speaker, and Language Recognition: You've Got to Know Your Limitations,” in Proc. of Interspeech'16, 2766-2770, September 8-12, (San Francisco, CA). [bib] [pdf]

Nandwana, M. K., Boril, H., Hansen, J. H. L. (2015). “A New Front-End for Classification of Non-Speech Sounds: A Study on Human Whistle,” in Proc. of Interspeech'15, 1982-1986, September 6-10 (Dresden, Germany). [bib] [pdf]

Ghaffarzadegan, S., Boril, H., Hansen, J. H. L. (2015). “Generative Modeling of Pseudo-Target Domain Adaptation Samples for Whispered Speech Recognition,” IEEE ICASSP'15, 5024-5028, April 19-24 (Brisbane, Australia). [bib] [pdf]

Boril, H., Zhang, Q., Ziaei, A., Hansen, J. H. L., Xu, D., Gilkerson, J., Richards, J. A., Zhang, Y., Xu, X., Mao, H., Xiao, L., Jiang, F. (2014). “Automatic Assessment of Language Background in Toddlers Through Phonotactic and Pitch Pattern Modeling of Short Vocalizations,” Workshop on Child Computer Interaction (WOCCI), September 19 (Singapore). [pdf] [cited] [bib]

Ghaffarzadegan, S., Boril, H., Hansen, J. H. L. (2014). “Model and Feature Based Compensation for Whispered Speech Recognition,” Interspeech'14, 2420-2424, September 14-18 (Singapore). [pdf] [cited] [bib]

Ghaffarzadegan, S., Boril, H., Hansen, J. H. L. (2014). “UT-VOCAL EFFORT II: Analysis and Constrained-Lexicon Recognition of Whispered Speech,” IEEE ICASSP'14, 2563-2567, May 4-9 (Florence, Italy). [pdf] [cited] [bib]

Hahm, S.-J., Boril, H., Angkititrakul, P., Hansen, J. H. L. (2013). “Advanced Feature Normalization and Rapid Model Adaptation For Robust In-Vehicle Speech Recognition,” 6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems (DSP in Vehicles), September 29-October 2 (Seoul, Korea). [pdf] [bib]

Boril, H., Zhang, Q., Angkititrakul, P., Hansen, J. H. L., Xu, D., Gilkerson, J., Richards, J. A. (2013). “A Preliminary Study of Child Vocalization on a Parallel Corpus of US and Shanghainese Toddlers,” in Proc. of Interspeech'13, 2405-2409, August 25-29 (Lyon, France). [pdf] [cited] [bib]

Hautamaki, V., Lee, K. A., van Leeuwen, D., Saeidi, R., Larcher, A., Kinnunen, T., Hasan, T., Sadjadi, S. O., Liu, G., Boril, H., Hansen, J. H.L., Fauve, B. (2013). “Automatic Regularization of Cross-Entropy Cost for Speaker Recognition Fusion” in Proc. of Interspeech'13, 1609-1613, August 25-29 (Lyon, France). [pdf] [cited] [bib]

Saeidi, R., Lee, K. A., Kinnunen, T., Hasan, T., Fauve, B.,Bousquet, P.-M., Khoury, E., Sordo Martinez, P. L., Kua, K., You, C., Sun, H., Larcher, A., Rajan, P., Hautamaki, V., Hanilci, C., Braithwaite, B., Gonzales-Hautamaki, R., Sadjadi, S. O., Liu, G., Boril, H., Shokouhi, N., Matrouf, D., El Shafey, L.,Mowlaee, P., Epps, J., Thiruvaran, T., van Leeuwen, D. A., Ma, B., Li, H., Hansen, J. H. L., Bonastre, J.-F., Marcel, S., Mason, J., Ambikairajah, E. (2013). “I4U Submission to NIST SRE 2012: A Large-Scale Collaborative Effort for Noise-Robust Speaker Verification,” in Proc. of Interspeech'13, 1986-1990, August 25-29 (Lyon, France). [pdf] [cited] [bib]

Zhang, Q., Boril, H., Hansen, J. H. L. (2013). “Supervector Pre-Processing for PRSVM-based Chinese and Arabic Dialect Identification,” IEEE ICASSP'13, 7363-7367, May 26-31 (Vancouver, Canada). [pdf] [cited] [bib]

Liu, G., Hasan, T., Boril, H., Hansen, J. H. L. (2013). “An Investigation on Back-End for Speaker Recognition in Multi-Session Enrollment,” IEEE ICASSP'13, 7755-7759, May 26-31 (Vancouver, Canada). [pdf] [cited] [bib]

Hasan, T., Sadjadi, O., Liu, G., Shokouhi, N., Boril, H., Hansen, J. H. L. (2013). “CRSS systems for 2012 NIST Speaker Recognition Evaluation,” IEEE ICASSP'13, 6783-6787, May 26-31 (Vancouver, Canada). [pdf] [cited] [bib]

Hasan, T., Liu, G., Sadjadi, S. O., Shokouhi, N., Boril, H., Ziaei, A., Misra, A., Godin, K. W., Hansen, J. H. L. (2012). “UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation,” NIST 2012 Speaker Recognition Evaluation Workshop, Dec. 11-12, 2012 (Orlando, Florida). [pdf] [cited] [bib]

Boril, H., Sangwan, A., Hansen, J. H. L. (2012). “Arabic Dialect Identification - 'Is the Secret in the Silence?' and Other Observations,” in Proc. of Interspeech'12, 30-33, September 9-13 (Portland, Oregon). [pdf] [cited] [bib]

Boril, H., Sadjadi, O., Hansen, J. H. L. (2012). “A Study on Combined Effects of Reverberation and Increased Vocal Effort on ASR,” LISTA'12 Workshop, 16-19, May 2-3 (Edinburgh, UK). [pdf] [cited] [bib]

Hasan, T., Boril, H., Sangwan, A., Hansen, J. H. L. (2012). “A Multi-modal Highlight Extraction Scheme for Sports Videos Using an Information-Theoretic Excitability Measure,” IEEE ICASSP'12, 2381-2384, March 25-30 (Kyoto, Japan). [pdf] [cited] [bib]

Sadjadi, O., Boril, H., Hansen, J. H. L. (2012). “A Comparison of Front-end Compensation Strategies for Robust LVCSR Under Room Reverberation and Increased Vocal Effort,” IEEE ICASSP'12, 4701-4704, March 25-30 (Kyoto, Japan). [pdf] [cited] [bib]

Liu, G., Sadjadi, S. O., Hasan, T., Suh, J.-W., Zhang, C., Mehrabani, M., Boril, H., Sangwan, A., Hansen, J. H. L. (2011). “UTD-CRSS Systems for NIST Language Recognition Evaluation 2011,” NIST 2011 Language Recognition Evaluation Workshop, December, 2011 (Atlanta, Georgia, US). [pdf] [cited] [bib]

Boril, H., Grezl, F., Hansen, J. H. L. (2011). “Front-End Compensation Methods for LVCSR Under Lombard Effect,” in Proc. of Interspeech'11, 1257-1260, August 28-31 (Florence, Italy). [pdf] [cited] [bib]

Boril, H., Sadjadi, O., Hansen, J. H. L. (2011). “UTDrive: Emotion and Cognitive Load Classification for In-Vehicle Scenarios,” The 5th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems, September 4-7 (Kiel, Germany). [pdf] [cited] [bib]

Boril, H., Hansen, J. H. L. (2011). “UT-Scope: Towards LVCSR under Lombard Effect Induced by Varying Types and Levels of Noisy Background,” IEEE ICASSP'11, 4472-4475, Prague, Czech Republic, May 2011. [pdf] [cited] [bib]

Boril, H., Hansen, J. H. L., et al. (2011). “A Longitudinal Study of Infant Speech Production Parameters: A Case Study,” LENA Users Conference, April 2011 (Denver, CO). [pdf] [bib]

Boril, H., Sangwan, A., Hasan, T., Hansen, J. H. L. (2010). “Automatic Excitement-Level Detection for Sports Highlights Generation,” in Proc. of Interspeech'10, 2202-2205 (Makuhari, Chiba, Japan). [pdf] [cited] [bib]

Boril, H., Sadjadi, O., Kleinschmidt, T., Hansen, J. H. L. (2010). “Analysis and Detection of Cognitive Load and Frustration in Drivers' Speech,” in Proc. of Interspeech'10, 502-505 (Makuhari, Chiba, Japan). [pdf] [cited] [bib]

Lei, Y., Hasan, T., Suh, J.-W., Sangwan, A., Boril, H., Gang, L., Godin, K., Zhang, C., Hansen, J. H. L. (2010). “The CRSS Systems for the 2010 NIST Speaker Recognition Evaluation,” NIST 2010 Speaker Recognition Evaluation Workshop, Brno, Czech Republic, 24-25 June 2010. [pdf] [cited] [bib]

Amuda, S., Boril, H., Sangwan, A., Hansen, J. H. L. (2010). “Limited Resource Speech Recognition for Nigerian English,” in Proc. of IEEE ICASSP'10, 5090-5093 (Dallas, TX). [pdf] [cited] [bib]

Mehrabani, M., Boril, H., Hansen, J. H. L. (2010). “Dialect Distance Assessment Method Based on Comparison of Pitch Pattern Statistical Models,” in Proc. of IEEE ICASSP'10, 5158-5161 (Dallas, TX). [pdf] [cited] [bib]

Kleinschmidt, T., Boyraz, P., Boril, H., Sridharan, S., Hansen, J. H. L. (2009). “Assessment of Speech Dialog Systems using Multi-Modal Cognitive Load Analysis and Driving Performance Metrics,” IEEE International Conference on Vehicular Electronics and Safety ICVES`09, 162-167 (Pune, India). [pdf] [cited] [bib]

Boril, H., Hansen, J. H. L. (2009). “Reduced Complexity Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments,” in Proc. of Interspeech'09, 1243-1246 (Brighton, UK). [pdf] [cited] [bib]

Boril, H., Boyraz, P., Hansen, J. H. L. (2009). “Towards Multi-Modal Driver's Stress Detection,” in Proc. of 4th Biennial Workshop on DSP for In-Vehicle Systems and Safety (Dallas, Texas). [pdf] [cited] [bib]

Boril, H., Krishnamurthy, N., Hansen, J. H. L. (2009). “Online Noise and Lombard Effect Compensation for In-Vehicle Automatic Speech Recognition,” in Proc. of 4th Biennial Workshop on DSP for In-Vehicle Systems and Safety (Dallas, Texas). [pdf] [cited] [bib]

Boril, H., Hansen, J. H. L. (2009). “Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environment,” in Proc. of IEEE ICASSP'09, 3937-3940 (Taipei, Taiwan). [pdf] [cited] [bib]

Boril, H., Fousek, P., and Höge, H. (2007). “Two-Stage System for Robust Neutral/Lombard Speech Recognition”, in Proc. of Interspeech'07, 1074-1077 (Antwerp, Belgium). [pdf] [cited] [bib]

Boril, H., Boril, T., and Pollák, P. (2006). “Methodology of Lombard Speech Database Acquisition: Experiences with CLSD”, in Proc. of LREC 2006 - 5th Conference on Language Resources and Evaluation, 1644-1647 (Genova, Italy). [pdf] [cited] [bib]

Boril, H., Fousek, P., and Pollák, P. (2006). “Data-Driven Design of Front-End Filter Bank for Lombard Speech Recognition”, in Proc. of Interspeech - ICSLP'06, 381-384 (Pittsburgh, Pennsylvania). [pdf] [cited] [bib]

Boril, H., Fousek, P., Sündermann, D., Cerva, P., and Zdansky, J. (2006). “Lombard Speech Recognition: A Comparative Study”, in Proc. 16th Czech-German Workshop on Speech Processing, 141-148 (Prague, Czech Republic). [pdf] [cited] [bib]

Boril, H. (2005). “Automatic Reconstruction of Utterance Boundaries Time Marks in Speech Database Re-grabbed from DAT Recorder.” In Proc. of International Workshop on Digital Technologies 2005, 2005, vol. 1, 13-16 (Zilina, Slovakia). [pdf] [bib]

Boril, H. and Pollák, P. (2005). “Comparison of Three Czech Speech Databases from the Standpoint of Lombard Effect Appearance”, in COST278 and ISCA Tutorial and Research Workshop (ITRW) on Applied Spoken Language Interaction in Distributed Environments (ASIDE 2005), Aalborg, Denmark. [pdf] [cited] [bib]

Boril, H. and Pollák, P. (2005). “Design and Collection of Czech Lombard Speech Database”, in Proc. of Interspeech'05, 1577-1580 (Lisboa, Portugal). [pdf] [cited] [bib]

Boril, H. and Pollák, P. (2004). “Direct Time Domain Fundamental Frequency Estimation of Speech in Noisy Conditions”, in Proc. EUSIPCO 2004, volume 2, 1003 - 1006 (Vienna, Austria). [pdf] [cited] [bib]

Lectures/Abstracts/Reports

Horn, S.,Boril, H.(2021). “Gender Classification from Speech Using Convolutional Networks Augmented with Synthetic Spectrograms,” J. Acoust. Soc. Am., Volume 150, A358, November. Presented in 181st ASA Meeting, Seattle, WA, November 29-December 3, 2021. [Abstract] [bib]

Boril, H. (2018). “Equalizing Speaker, Speaking Style and Environment-Induced Variability in Audio Streams for Robust Speech Engines,” invited talk, SpeechLab at Shanghai Jiao Tong University, June 12, 2018. (Shanghai, China).

Boril, H. (2017). “From Acoustic Waveforms to Text: Digital Signal Processing and Machine Learning for Speech Recognition,” in International Technology, Engineering and Innovation Congress (CITII), keynote talk, Universidad de San Buenaventura, Sede Bogota, October 18-20, 2017. (Bogota, Colombia).

Boril, H. (2017). “Signal Processing and Machine Learning for Voice Recognition and Beyond,” in Data Science and Digital Information Services: The Second Youth Talent Forum in Software Engineering Science, invited lecture, Dalian University of Technology, School of Software Technology, July 26--28, (Dalian, China).

Sangwan, A., Boril, H., Hasan, T., Hansen, J. H. L. (2014). “A Multimodal System for Automatic Sports Highlights Generation,” in 2014 IEEE Spoken Language Technology Workshop (SLT 2014), demo exhibition, December 7-10 (South Lake Tahoe, Nevada, USA). [pdf]

Boril, H., Ziaei, A., Hansen, J. H. L. (2013). “Prof-Life-Log: Production of Conversational Speech as a Function of Varying Environment,” in LENA International Conference, Poster Presentation, April 28-30 (Denver, Colorado).

Hansen, J. H. L, Boril, H., Sathyanarayana, A. (2012). “Speech Communications-Driving Behavior-and Safety: Can they co-exist?” CREST Symposium on Human-Harmonized Information Technology, Invited Lecture, April 2, 2012 (Kyoto, Japan).

Boril, H., Sangwan, A., Hasan, T., Hansen, J. H. L. (2010). “Automatic Excitement-Level Detection for Sports Highlights Generation,” in Wireless Long Term Evolution - The Connected World, UT Dallas Research and New Venture Showcase, Poster Presentation (Dallas, TX).

Boril, H., Sadjadi, O., Kleinschmidt, T., Hansen, J. H. L. (2010). “Analysis and Detection of Cognitive Load and Frustration in Drivers' Speech,” Wireless Long Term Evolution - The Connected World, UT Dallas Research and New Venture Showcase, Poster Presentation (Dallas, TX).

Boril, H., Kleinschmidt, T., Boyraz, P., Hansen, J. H. L. (2010). “Impact of Cognitive Load and Frustration on Drivers' Speech,” J. Acoust. Soc. Am., Volume 127, Issue 3, pp. 1996-1996 (March 2010). Presented in Joint 159th ASA Meeting and Noise-Con 2010, Baltimore, Maryland, 19-23 April 2010. Invited Lecture [pdf] [cited] [bib]

Boril, H. (2008). “Attributes and Recognition of Lombard Speech,” Invited Lecture, Sound to Sense (S2S) Workshop - Speech in Adverse Conditions (Prague, Czech Republic). [ppt] [bib]

Boril, H. (2007). “Normalization of Lombard effect”, Research Report No. R07-2, 52 pages, Czech Technical University in Prague & Siemens Corporate Technology (Munich, Germany). [bib]

Boril, H. and Pollák, P. (2006). “Czech Lombard Speech Database (CLSD`05)”, Technical Report No. R07-1, 24 pages, Czech Technical University in Prague. [pdf] [bib]

Boril, H. and Pollák, P. (2006). “Pitch-marking Based on the DFE Algorithm.” Lecture, 6th ECESS and TC-STAR WP3 Meeting (Berlin, Germany). [pdf] [bib]

Theses

Boril, H. (2008). “Robust speech recognition: Analysis and equalization of Lombard effect in Czech corpora,” Ph.D. dissertation, Czech Technical University in Prague, Czech Republic. [pdf] [cited] [bib]

Boril, H. (2003). “Guitar MIDI converter”, Master's thesis, Czech Technical University in Prague, in Czech. [pdf] [cited] [bib]

Other Publications

Proceedings

Boril, H. (2006). “Design of Speech Feedback; Comparison of Features for Lombard speech recognition,” Analysis and Processing of Speech and Bilogical Signals, CTU Publishing House, Prague, pp. 24-30, in Czech.

Boril, H., Fousek, P. (2006). “Influence of Different Speech Representations and HMM Training Strategies on ASR Performance,” In Proc. Intl. Student Conf. POSTER 2006, Prague.

Boril, H. and Pollák, P. (2005). “Analysis of Lombard Effect in Several Czech Databases,” In Proceedings of the Joint 16th Conference on Electronic Speech Signal Processing ESSP 2005 and 15th Czech-German Workshop on Speech Processing. vol. 1, pp. 253-259, Prague.

Boril, H., Boril, T., and Pollák, P. (2005). “Design of Lombard Effect Speech Database,” In Proceedings of RADIOELEKTRONIKA 2005, pp. 144-147, Brno, Czech Republic. [cited]

Boril, H. (2004). “Recognition of Speech under Lombard Effect,” In Proc. 14th Czech-German Workshop on Speech Processing, pp. 110-113, Prague 2004. [cited]

Boril, H. (2004). “Parameter Changes and Recognition of Speech under Stress,” Survey, Signal Analysis and Processing V, CTU Publishing House, Prague, pp. 54-65, in Czech.

Boril, H. (2003). “Direct Time Fundamental Frequency Estimation,” In Proc. Polish-Hungarian-Czech Workshop on Circuit Theory, Signal Processing and Applications, pp. 59-64, Prague.

Boril, H. (2003). “Pitch Detector for Guitar MIDI Converter,” In Proc. Intl. Student Conf. POSTER 2003, Prague. [cited]

Last Updated 3-28-2022

[cited]

Hynek Boril, Ph.D.

Selected Publications

Other Publications