He received his masters’ degree in computer science at the University of Twente in 1992 and completed his PhD on Formal operation definition in object-oriented databases in 1997. His research targets robustness in data science focusing on two main threats to data science reliability: data quality and undesirable machine learning behaviour. The former is focused on data integration, semi-structured data, natural language processing, and data quality issues involved in these. He co-developed one of the most scalable XML database systems of its time: MonetDB/XQuery. Furthermore, he proposed a data integration approach, called Probabilistic Data Integration, which fundamentally incorporates handling of uncertain and of lesser quality data. He developed a probabilistic database system, called DuBio, which allows the scalable storage, manipulation and management of such uncertain data. On the threat of undesirable machine learning behaviour, he focuses on Explainable AI with the intrinsically explainable deep learning approach ProtoTree as one of the notable results of this. He is secretary of the executive board of the EDBT Association (Extending Database Technology). He is the (co-) author of about 200 publications that accumulated about 2000 citations.
Kippers, R. , Koeva, M. N. , van Keulen, M. , & Oude Elberink, S. J. (2021). Automatic 3D building model generation using deep learning methods based on cityjson and 2D floor plans. In L. Truong-Hong, E. Che, F. Jia, S. Emamgholian, D. Laefer, & A. V. Vo (Eds.), The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences (Vol. XLVI-4-W4, pp. 49-54). (International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences). Copernicus. https://doi.org/10.5194/isprs-archives-XLVI-4-W4-2021-49-2021
Sohail, S. A. , Bukhsh, F. A. , & van Keulen, M. (2021). Multilevel Privacy Assurance Evaluation of Healthcare Metadata. Proceedings (MDPI), 11(22), . https://doi.org/10.3390/app112210686
Mauritz, R. R., Nijweide, F. P. J. , Goseling, J. , & van Keulen, M. (2021). Autoencoder-based cleaning in probabilistic databases. arXiv.org. https://arxiv.org/abs/2106.09764
Sohail, S. A. , Bukhsh, F. A. , van Keulen, M., & Krabbe, J. G. (2021). Identifying Materialized Privacy Claims of Clinical-Care Metadata Share using Process-Mining and REA ontology. 111-120. Paper presented at 15th International Workshop on Value Modelling and Business Ontologies, VMBO 2021, Virtual Workshop. http://ceur-ws.org/Vol-2835/paper12.pdf
Nguyen, E., Theodorakopoulos, D. , Pathak, S., Geerdink, J., Vijlbrief, O. , van Keulen, M. , & Seifert, C. (2021). A Hybrid Text Classification and Language Generation Model for Automated Summarization of Dutch Breast Cancer Radiology Reports. In 2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI) (pp. 72-81).  IEEE. https://doi.org/10.1109/CogMI50398.2020.00019
Provoost, J. C. , Kamilaris, A. , Wismans, L. J. J., van der Drift, S. J. , & van Keulen, M. (2020). Predicting parking occupancy via machine learning in the web of things. Internet of Things, 12, 100301. https://doi.org/10.1016/j.iot.2020.100301
Bellatreche, L., Bentayeb, F., Bieliková, M., Boussaid, O., Catania, B., Ceravolo, P., Demidova, E., Halfeld Ferrari, M., Lopez, M. T. G., Hara, C. S., Kordić, S., Luković, I., Mannocci, A., Manghi, P., Osborne, F., Papatheodorou, C., Ristić, S., Sacharidis, D., Romero, O., ... Zumer, M. (2020). Databases and Information Systems in the AI Era: Contributions from ADBIS, TPDL and EDA 2020 Workshops and Doctoral Consortium. In L. Bellatreche, M. Bieliková, O. Boussaïd, J. Darmont, B. Catania, E. Demidova, F. Duchateau, M. Hall, T. Mercun, M. Žumer, B. Novikov, C. Papatheodorou, T. Risse, O. Romero, L. Sautot, G. Talens, & R. Wrembel (Eds.), ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium - International Workshops: DOING, MADEISD, SKG, BBIGAP, SIMPDA, AIMinScience 2020 and Doctoral Consortium, Proceedings (pp. 3-20). (Communications in Computer and Information Science; Vol. 1260). Springer. https://doi.org/10.1007/978-3-030-55814-7_1
Ruis, F. , Pathak, S., Geerdink, J. , Hegeman, J. H. , Seifert, C. , & van Keulen, M. (2020). Human-in-the-loop Language-agnostic Extraction of Medication Data from Highly Unstructured Electronic Health Records. In 20th International Conference on Data Mining Workshops 2020 IEEE EDS.
Nauta, M. , Putten, M. J. A. M. V. , Tjepkema-Cloostermans, M. C., Bos, J. P. , Keulen, M. V. , & Seifert, C. (2020). Interactive Explanations of Internal Representations of Neural Network Layers: An Exploratory Study on Outcome Prediction of Comatose Patients. In K. Bach, R. Bunescu, C. Marling, & N. Wiratunga (Eds.), KDH 2020: 5th International Workshop on Knowledge Discovery in Healthcare Data (Vol. 2675, pp. 5-11). (CEUR Workshop Proceedings; Vol. 2675). CEUR. http://ceur-ws.org/Vol-2675/paper1.pdf
Marazza, F. , Bukhsh, F. A., Geerdink, J., Vijlbrief, O. , Pathak, S. , van Keulen, M. , & Seifert, C. (2020). Automatic Process Comparison for Subpopulations: Application in Cancer Care. International journal of environmental research and public health, 17(16), 1-23. . https://doi.org/10.3390/ijerph17165707