FEATURED PROJECTS
Sentiment Analysis of airBNB Reviews
This project is part of a larger effort to create a predictive model for the value of short-term real estate investments, such as properties purchased to act as airBNB rentals.
The core of the project rests on sentiment analysis of airBNB reviews in a deep learning framework to determine overall positivity of a city or venue based on unstructured user input.
WikiSimilar: A Full Stack Approach to Learning History
This is a full-stack educational application that allows users to learn random historical facts of any theme - creativity is encouraged.
The app relies on a containerized Docker image of a custom Wikipedia web-scraping program, which ran on an AWS EC2 instance and loaded data into a MySQL database hosted on AWS RDS servers.
Global Terrorism Preparedness Model
This is an end-to-end project based on real world data from the Global Terrorism Database. The deliverable is an interactive dashboard that allows the user to quickly see the most probable targets and forms of a terrorist attack given a certain motive or societal circumstance as unstructured input.
The model relies on Natural Language Processing and binary classification using Microsoft's LightGBM algorithm.
Areas of Expertise
Bringing data to life is a process. How do I do it?
Extraction and Wrangling
Data is everywhere, and it's rarely ready for analysis. I'm accustomed to gathering data from multiple sources, cleaning and formatting for the use case, compiling in DataFrames, and loading it into SQL databases for easy access
Toolkit: Pandas, MySQL DB, PySpark.
Predictive Analytics
It's one thing to understand the past, but how does history inform the future? I build predictive models using both classical machine learning methods and deep learning techniques to forecast variables of interest.
Toolkit: Sk-Learn, PyTorch, Transformers.
Statistical Transformation
Once data is clean, how can you tell what matters and what's just noise? I'm proficient in descriptive and inferential statistical analysis to reveal complex patterns, associations, and features of primary importance.
Toolkit: Statsmodels, Matplotlib.
Cloud Deployment
The best code has no business value in a Jupyter notebook. Once I create apps or programs, I deploy them through cloud-based platforms that make them easy to access, update, and integrate new features.
Toolkit: Docker, AWS, GCP
Data Visualization
No one wants to look at rows and columns to make decisions. I create key visualizations using an array of formats and interactive dashboards to help users understand the most important information right up front.
Toolkit: Tableau, Seaborn.
Stakeholder Presentation
At the end of the day, stakeholders need stories. I'm an expert storyteller with excellent presentation and reporting skills based on years of experience teaching, conducting research, and mentoring others.
Toolkit: Google Slides, MS Office