top of page

CREATOR.
PROGRAMMER.
ENGINEER.

Hi there,

My name is Sam Castillo.

I'm a data engineer with a passion for helping others thrive.

 

My background lies in machine learning, but I currently work in cybersecurity.

Thanks for stopping in.

FEATURED PROJECTS

codesnap.png

Sentiment Analysis of airBNB Reviews

This project is part of a larger effort to create a predictive model for the value of short-term real estate investments, such as properties purchased to act as airBNB rentals.  

The core of the project rests on sentiment analysis of airBNB reviews in a deep learning framework to determine overall positivity of a city or venue based on unstructured user input. 

image.png

WikiSimilar: A Full Stack Approach to Learning History

This is a full-stack educational application that allows users to learn random historical facts of any theme - creativity is encouraged.

 

The app relies on a containerized Docker image of a custom Wikipedia web-scraping program, which ran on an AWS EC2 instance and loaded data into a MySQL database hosted on AWS RDS servers. 

terrorism-map.png

Global Terrorism Preparedness Model

This is an end-to-end project based on real world data from the Global Terrorism Database. The deliverable is an interactive dashboard that allows the user to quickly see the most probable targets and forms of a terrorist attack given a certain motive or societal circumstance as unstructured input.

 

The model relies on Natural Language Processing and binary classification using Microsoft's LightGBM algorithm.

Areas of Expertise

Bringing data to life is a process. How do I do it? 

Extraction and Wrangling

Data is everywhere, and it's rarely ready for analysis. I'm accustomed to gathering data from multiple sources, cleaning and formatting for the use case, compiling in DataFrames, and loading it into SQL databases for easy access

Toolkit: Pandas, MySQL DB, PySpark.

Predictive Analytics

It's one thing to understand the past, but how does history inform the future? I build predictive models using both classical machine learning methods and deep learning techniques to forecast variables of interest.​

Toolkit: Sk-Learn, PyTorch, Transformers.

Statistical Transformation

Once data is clean, how can you tell what matters and what's just noise? I'm proficient in descriptive and inferential statistical analysis to reveal complex patterns, associations, and features of primary importance. 

​Toolkit: Statsmodels, Matplotlib.

clouds in the sky.jpg

Cloud Deployment

The best code has no business value in a Jupyter notebook. Once I create apps or programs, I deploy them through cloud-based platforms that make them easy to access, update, and integrate new features.

Toolkit: Docker, AWS, GCP

Data Visualization

No one wants to look at rows and columns to make decisions. I create key visualizations using an array of formats and interactive dashboards to help users understand the most important information right up front.

Toolkit: Tableau, Seaborn.

professor giving a lecture.jpg

Stakeholder Presentation

At the end of the day, stakeholders need stories. I'm an expert storyteller with excellent presentation and reporting skills based on years of experience teaching, conducting research, and mentoring others. 

Toolkit: Google Slides, MS Office

bottom of page