Episode 64: Improving Access to High-Quality Data with Fabiana Clemente
- (02:06) Fabiana talked about her Bachelor’s degree in Applied Mathematics from the University of Lisbon in the early 2010s.
- (04:18) Fabiana shared lessons learned from her first job out of college as a Siebel and BI Developer at Novabase.
- (05:13) Fabiana discussed unique challenges while working as an IoT Solutions Architect at Vodafone.
- (09:56) Fabiana mentioned projects she contributed to as a Data Scientist at startups such as ODYSAI and Habit Analytics.
- (12:44) Fabiana talked about the two Master’s degrees she got while working in the industry (Applied Econometrics from Lisbon School of Economics and Management and Business Intelligence from NOVA IMS Information Management School).
- (14:41) Fabiana distinguished the difference between data science and business intelligence.
- (18:01) Fabiana shared the founding story of YData, the first data-centric platform with synthetic data, whose she is currently the Chief Data Officer.
- (21:32) Fabiana discussed different techniques to generate synthetic data, including oversampling, Bayesian Networks, and generative models.
- (24:01) Fabiana unpacked the key insights in her blog series on generating synthetic tabular data.
- (29:40) Fabiana summarized novel design and optimization techniques to cope with the challenges of training GAN models.
- (33:44) Fabiana brought up the benefits of using Differential Privacy as a complement to synthetic data generation.
- (38:07) Fabiana unpacked her post “The Cost of Poor Data Quality,” — where she defined data quality as data measures based on factors such as accuracy, completeness, consistency, reliability, and above all, whether it is up to date.
- (42:11) Fabiana explained the important role that data quality plays in ensuring model explainability.
- (44:57) Fabiana reasoned about YData’s decision to pursue the open-source strategy.
- (47:47) Fabiana discussed her podcast called “When Machine Learning Meets Privacy” in collaboration with the MLOps Slack community.
- (49:14) Fabiana briefly shared the challenges encountered to get the first cohort of customers for YData.
- (50:12) Fabiana went over valuable lessons to attract the right people who are excited about YData’s mission.
- (51:52) Fabiana shared her take on the data community in Lisbon and her effort to inspire more women to join the tech industry.
- (53:47) Closing segment.
Fabiana’s Contact Info
- Jean-Francois Rajotte (Resident Data Scientist at the University of British Columbia)
- Sumit Mukherjee (Associate Professor of Statistics at Columbia University)
- Andrew Trask (Leader at OpenMined, Research Scientist at DeepMind, Ph.D. Student at the University of Oxford)
- Théo Ryffel (Co-Founder of Arkhn, Ph.D. Student at ENS and INRIA, Leader at OpenMined)