I am an Assistant Professor at the Kahlert School of Computing at the University of Utah. Previously, I was a research scientist and a postdoctoral associate at the Computer Science and Artificial Intelligence Laboratory (CSAIL) of MIT where I worked with the A.M. Turing Award recipient Michael Stonebraker.
I earned my Ph.D. in Computer Science from Purdue University under the supervision of Walid Aref and Mourad Ouzzani.
My research interests revolve around data management in general and data quality in particular.
Data scientists spend most of their time doing "grunt" work, i.e., discovering, preparing and cleaning the data. The goal of my research is to build systems that target the main pain points in data science development: Data discovery; Data preparation and Data debugging.
In collaboration with industrial parties (e.g., Intel, Massachusetts General Hospital), my systems are motivated by real-world use cases.
Recent News
- [September 2024] NSF Proposal "NSF-Simons AI Institute for Cosmic Origins" funded (PI: Stella Offner, U Texas at Austin)
- [June 2024] Proposal (as a sole PI) funded by the University of Utah Office of the Vice President for Research
- [August 2023] I joined the University of Utah as an Assistant Professor in the Kahlert School of Computing.
- [February 2023] Workshop proposal accepted at VLDB 2023.
- [December 2022] I will be a PC chair at the Northeast Database Day 2023
- [October 2022] Paper accepted at SIGCSE TS 2023.
- [September 2022] Chaired the Poly workshop at VLDB in Sydney.
- [November 2021] Extended abstract accepted to CIDR 2022.
- [July 2021] Research paper accepted to VLDB 2021.
- [May 2021] Demonstration paper accepted to VLDB 2021.
- [March 2021] Co-chairing the Poly@VLDB workshop.
- [April 2021] Reviewer of the VLDB Journal.
- [April 2021] Gave a talk on DICE at the AI Accelerator annual meeting.
- [March 2021] Gave a talk on Dagger at Intel Labs
- [January 2021] Invited to the PC of SIGKDD 2021.
- [January 2021] Gave a talk on Dagger at the DSAIL convention.
- [January 2021] Gave a talk at CIDR 2021.
- [November 2020] Extended abstract accepted to CIDR 2021.
Research
Conference, demonstration and workshop papers
- Rowan Hart, Brian Hays, Connor McMillin, El Kindi Rezig, Gustavo Rodriguez-Rivera, Jeffrey A. Turkstra:
Eastwood-Tidy: C Linting for Automated Code Style Assessment in Programming Courses [PDF]
SIGCSE 2023
- El Kindi Rezig, Anshul Bhandari, Anna Fariha, Benjamin Price, Allan Vanterpool, Andrew Bowne, Lindsey McEvoy, Vijay Gadepally:
Examples are All You Need: Iterative Data Discovery by Example in Data Lakes [Extended Abstract] [PDF]
CIDR 2022
- El Kindi Rezig, Mourad Ouzzani, Walid Aref, Ahmed Elmagarmid, Ahmed R. Mahmood, Michael Stonebraker:
Horizon: Scalable Dependency-driven Data Cleaning [PDF]
VLDB 2021
- El Kindi Rezig, Anshul Bhandari, Anna Fariha, Benjamin Price, Allan Vanterpool, Vijay Gadepally, Michael Stonebraker:
DICE: Data Discovery by Example [Demo] [PDF]
VLDB 2021
- El Kindi Rezig:
Data Cleaning in the Era of Data Science: Challenges and Opportunities [Extended abstract] [PDF]
CIDR 2021
- El Kindi Rezig, Allan Vanterpool, Vijay Gadepally, Benjamin Price, Michael J. Cafarella, Michael Stonebraker:
Towards Data Discovery by Example [PDF]
Poly@VLDB 2020
- El Kindi Rezig, Ashrita Brahmaroutu, Nesime Tatbul, Mourad Ouzzani, Nan Tang, Timothy G. Mattson, Samuel Madden, Michael Stonebraker:
Debugging Large-Scale Data Science Pipelines using Dagger [Demo] [PDF]
VLDB 2020
- El Kindi Rezig, Lei Cao, Giovanni Simonini, Maxime Schoemans, Samuel Madden, Nan Tang, Mourad Ouzzani, Michael Stonebraker:
Dagger: A Data (not code) Debugger [PDF]
CIDR 2020
- Michael Stonebraker, El Kindi Rezig:
Machine Learning and Big Data: What is Important? [PDF]
IEEE Data Engineering Bulletin 2019
- El Kindi Rezig, Lei Cao, Michael Stonebraker, Giovanni Simonini, Wenbo Tao, Samuel Madden, Mourad Ouzzani, Nan Tang, Ahmed Elmagarmid:
Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics [Demo] [PDF]
VLDB 2019
- El Kindi Rezig, Mourad Ouzzani, Ahmed Elmagarmid, Walid Aref, Michael Stonebraker:
Towards an End-to-End Human-Centric Data Cleaning Framework [PDF]
HILDA@SIGMOD 2019
- El Kindi Rezig, Eduard Dragut, Mourad Ouzzani, Ahmed Elmagarmid, Walid Aref:
ORLF: A Flexible Framework for Online Record Linkage and Fusion [Demo] [PDF]
ICDE 2016
- Ahmed R. Mahmood, Ahmed M. Aly, Thamir Qadah, El Kindi Rezig, Anas Daghistani, Amgad Madkour, Ahmed S. Abdelhamid,
Mohamed S. Hassan, Walid G. Aref, Saleh Basalamah:
Tornado: A Distributed Spatio-Textual Stream Processing [Demo] [PDF]
VLDB 2015
- El Kindi Rezig, Eduard C. Dragut, Mourad Ouzzani, Ahmed K. Elmagarmid:
Query-Time Record Linkage and Fusion over Web Databases [PDF]
ICDE 2015
- Hazem Elmeleegy, Jaewoo Lee, El Kindi Rezig, Mourad Ouzzani, Ahmed Elmagarmid:
UMAP: A System for Usage-Based Schema Matching and Mapping [Demo] [PDF]
SIGMOD 2011
Edited books
El Kindi Rezig , Vijay Gadepally, Timothy G. Mattson, Michael Stonebraker, Tim Kraska, Fusheng Wang, Gang Luo, Jun Kong, Alevtina Dubovitskaya:
Heterogeneous Data Management, Polystores, and Analytics for Healthcare - VLDB Workshops, Poly 2021 and DMAH 2021, Revised Selected Papers. ISSN 0302-9743. To appear.
Springer Lecture Notes in Computer Science
Technical Reports
- El Kindi Rezig, Michael J. Cafarella, Vijay Gadepally:
Technical Report on Data Integration and Preparation
CoRR abs/2103.01986 (2021)
- El Kindi Rezig, Mourad Ouzzani, Walid Aref, Ahmed Elmagarmid, Ahmed R. Mahmood:
Pattern-Driven Data Cleaning
CoRR abs/1712.09437 (2017)
Undergraduate Research
Teaching
Teaching Awards
- Raymond Boyce Teaching Excellence Award (highest teaching award in the Purdue CS department), 2017.
- Purdue Computer Science Teaching Fellowship (a highly selective award program that offers distinguished TAs the opportunity to become instructors), 2015 - 2016.
Classes (The University of Utah)
- CS4964/CS6949: Data Management for & with ML (Spring 2024)
- CS3190: Foundations of Data Analysis (Fall 2023)
Classes (Purdue University)
- [Summer 2017, 2018] CS50011: Introduction To Systems For Information Security (+30 students). Role: Co-developer and instructor (2 terms).
- [8 terms] CS252: Systems Programming (200+ students). Role: TA (6 terms) and instructor (2 terms).
- [Spring 2018] CS251: Algorithms and Data Structures (200+ students). Role: TA (1 term).
- [Spring 2017] CS240: Programming in C (200+ students). Role: TA (2 terms).
Recent talks
- Dagger: a Provenance-Based Data Debugging System. Intel Labs. March 2021
- DICE: Data Discovery by Example. AI Accelerator annual meeting. April 2021
- Dagger: a Provenance-Based Data Debugging System. Data Systems and Artificial Intelligence (DSAIL) convention at MIT. April 2021
- Data Cleaning in the Era of Data Science: Challenges and Opportunities. CIDR January 2020
- Debugging Large-Scale Data Science Pipelines using Dagger. VLDB. August 2020
- Dagger: a Data (not code) Debugger. CIDR. January 2020
Service
- PC member of VLDB 2025, SIGMOD 2025, ICDE 2024, VLDB 2024, KDD 2024, KDD 2023, SDM 2024, TKDE 2024, and HILDA 2023
- PC member of SIGMOD 2023
- External reviewer in CHI 2023
- PC member of SIGKDD 2022
- PC member of SIGMOD 2022 (demo track)
- PC member of VLDB 2022 (demo track)
- PC member of HILDA@SIGMOD 2022
- Reviewer of the SIGMOD Record 2021
- Session chair at SIGKDD 2021 (research track)
- PC member of SIAM International Conference on Data Mining 2022
- Co-chair of the POLY'21 workshop (co-located with VLDB 2021)
- PC member of SIGKDD 2021
- PC member of DASFAA 2021
- Reviewer of the VLDB Journal. 2021
- Reviewer of the ACM Journal of Data and Information Quality. 2021
- Demo session chair at VLDB 2020
- PC member of SIAM International Conference on Data Mining 2020
- Reviewer of the IEEE Transactions on Knowledge and Data Engineering. 2018 - 2020
- External reviewer for various publications: SIGMOD, ICDE, CIKM, SSTD, EDBT, IEEE Transactions on Services Computing, WISE, SSDBM.