Data Scientist & Operations Leader

Raw data.
Refined insight.

I build machine learning models and applied AI systems that translate complex data into decisions people can actually use from unsupervised clustering to LLM-powered analytics, end to end.

Current Role Operations Manager
Education MS Data Science · Northwestern · BA · Colby College
Stack Python · R · SQL
Location Chicago, IL
AN

Selected work

Applied machine learning across real business problems. Each project is built end to end from messy data through to a result a decision-maker can use.

02
Real Estate
Melbourne Housing Price Benchmark

Clustering 8,887 properties by physical characteristics to identify distinct property types for price benchmarking, independent of location labels.

K-Means Hierarchical t-SNE R
03
AI & NLP
Natural Language to SQL Query Engine

An AI-powered query engine that translates plain English questions into executable SQL using GPT-3.5, applied to Amazon product and review data hosted in PostgreSQL.

GPT-3.5 PostgreSQL NLP Python
04
Finance
Loan Default Risk Prediction

End-to-end credit risk pipeline predicting the likelihood of loan default using borrower and loan-level features, with SHAP-based explainability to surface the drivers behind each risk score.

XGBoost Classification SHAP Python
05
Sports Analytics
WNBA Player Analytics & Outcome Prediction

Clustering WNBA players by playing style, predicting game outcomes, and identifying potentially mispriced betting lines using neural network models.

Neural Network Clustering Association Rules Python
06
E-Commerce
Ecommerce Return Fraud Analysis

Clustering return transactions to detect patterns indicative of return fraud, surfacing high-risk segments and behavioral signals to support loss prevention teams.

Clustering Anomaly Detection Python Streamlit
07
Political Science
Political Distrust Analysis

Applying PCA to survey and behavioral data to identify latent dimensions of political distrust, revealing distinct population segments and their underlying attitudes toward institutions.

PCA Clustering Survey Analysis Python

Technical toolkit

Python is my primary working language, with additional depth in R and SQL. Every tool chosen for the problem, not for novelty.

Languages
Python R SQL PostgreSQL Excel VBA
Machine Learning
K-Means Clustering Hierarchical Clustering PCA t-SNE Association Rule Learning Logistic Regression Classification Feature Engineering Cross-validation Anomaly Detection
Deep Learning
Neural Networks LSTM Time Series Autoencoders TensorFlow scikit-learn
Applied AI & NLP
LLM Integration Transcript Analytics GPT API Prompt Engineering NL-to-SQL Local LLMs Text Classification
Analytics & Visualization
Exploratory Data Analysis matplotlib ggplot2 Financial Analysis Performance Metrics Reporting
Operations & Strategy
Process Improvement Project Management Change Management Stakeholder Management Cross-functional Leadership

Let's connect