Why do you want to volunteer for this opportunity?
I love hearing about data science projects to do good, enjoy giving feedback when I can, and feel that this is a great way for me to finally get involved!
What’s your educational background?
I graduated from UCLA with a neuroscience BS and Asian American Studies BA. I then attended UCLA’s School of Public Health for a degree in Biostatistics.
Describe your data science experience, such as a project you completed or managed.
During college and grad school, I managed data pipelines, performed statistical analyses, and contributed to grant proposals and paper submissions at an academic neuroimaging lab.
Later, I joined a health tech startup in San Francisco, where I was one of its first data scientists. While helping with customer-facing data science projects, I led the development of record linkage algorithms, and later joined its data engineering team.
Currently, I’m a senior machine learning engineer with Salesforce where I work on deep learning-based language products in the service sector. I’m also currently a technical advisor to aspiring AI practitioners at Insight Data Science, a technical training program catering to post-doctorate scientists transitioning to data science.
Briefly explain how you would evaluate a project, as related to, for example, its impact, scalability, replicability, and practicality.
I would first assess needs, identify requirements, and evaluate the project’s team. For example, how aligned are the project’s deliverables to the target community’s short-term and long-term needs? Are there any gaps in the project’s requirements gathering? How experienced is the team? And how diverse (in both personal and technical roles)?
I would then examine data sources. For example, how readily available is the data? How is data provenance governed? How often is the data updated? Are there any possible biases? How well can the data generalize? How is data security and privacy considered?
Next, I would review processes and methods. For example, what processes are in place to ensure reproducibility, e.g. documentation, version control, tests? What model and software architectures are considered? How well do these architectures align with model performance and system scalability requirements?
Finally, I would look at project deliverables. Are they user-friendly? If machine learning models are included, is it interpretable? What do maintenance plans look like — for models, software, and documentation?