Staff Data Engineer (commercial products)Staff Data Engineer (commercial products)
Airbnb · Full-timeAirbnb · Full-timeFeb 2021 - Present · 1 yr 7 mosFeb 2021 - Present · 1 yr 7 mosSan Francisco Bay AreaSan Francisco Bay Area- Decreased the landing time of Airbnb's unit economics dataset from 3 days to 1 1/2 days. Built a long-term roadmap to further increase the quality of this critical dataset.
Improved the quality of online systems at Airbnb to be easily compilable by Spark pipelines to compute metrics offline.
Lead a team of 6 engineers to deliver on the MIDAS initiative to increase the data quality of the Commercial Products org.
Upleveled smart pricing at Airbnb by improving the feature engineering and latency of the data used to train the smart pricing model.Decreased the landing time of Airbnb's unit economics dataset from 3 days to 1 1/2 days. Built a long-term roadmap to further increase the quality of this critical dataset. Improved the quality of online systems at Airbnb to be easily compilable by Spark pipelines to compute metrics offline. Lead a team of 6 engineers to deliver on the MIDAS initiative to increase the data quality of the Commercial Products org. Upleveled smart pricing at Airbnb by improving the feature engineering and latency of the data used to train the smart pricing model.
- Skills: Big Data · Scala · Apache Spark · SQL · Machine Learning · Apache Superset · Apache Airflow · Mentoring · Team Leadership · Data Visualization · Data Analysis · Java · Python · Linux · Git · Googling
- Built a machine learning feedback system that allowed security engineers to label corporate user behavior as risky or not risky.
Built Asset Inventory - a graph database solution that is a map of all of Netflix's cloud infrastructure.
Built a machine learning feedback system that allowed security engineers to label corporate user behavior as risky or not risky. Built Asset Inventory - a graph database solution that is a map of all of Netflix's cloud infrastructure.
- Skills: Big Data · Scala · Apache Spark · Cybersecurity · SQL · Machine Learning · Apache Airflow · Team Leadership · Data Visualization · Data Analysis · Java · Python · JavaScript · HTML · Cascading Style Sheets (CSS) · Node.js · React.js · D3.js · Linux · Git · PostgreSQL · REST APIs · Spring Framework · GooglingSkills: Big Data · Scala · Apache Spark · Cybersecurity · SQL · Machine Learning · Apache Airflow · Team Leadership · Data Visualization · Data Analysis · Java · Python · JavaScript · HTML · Cascading Style Sheets (CSS) · Node.js · React.js · D3.js · Linux · Git · PostgreSQL · REST APIs · Spring Framework · Googling
- I built a pipeline that measures the cloud infrastructure impact on AB tests, saving Netflix millions by allowing them to make smarter AB test rollout decisions. I built a pipeline that measures the cloud infrastructure impact on AB tests, saving Netflix millions by allowing them to make smarter AB test rollout decisions.
- Skills: Big Data · Apache Spark · Cybersecurity · SQL · Machine Learning · Apache Airflow · Data Visualization · Data Analysis · Java · Python · Linux · Git · Googling
Data EngineerData Engineer
FacebookFacebookAug 2016 - May 2018 · 1 yr 10 mosAug 2016 - May 2018 · 1 yr 10 mosMenlo Park, CaliforniaMenlo Park, California- - Managed a 10 PB+ Hive data warehouse
- Consolidated and conformed company-wide growth metrics (across WhatsApp, Instagram, Messenger, and Facebook) into a single, company-wide view.
- Optimized machine learning feature set generation pipelines (200+ TB/day) from having a 4 day latency to having a 1 day latency. While also dropping compute costs for those pipelines 4x.
- Reduced core notification data set latencies from 36 hours to < 8 hours.
- Migrated 50% of notifications pipelines from using Hive to use Spark, Presto, or real-time streaming.
- Cut compute cost from notifications pipelines by 40% over the course of 9 months.- Managed a 10 PB+ Hive data warehouse - Consolidated and conformed company-wide growth metrics (across WhatsApp, Instagram, Messenger, and Facebook) into a single, company-wide view. - Optimized machine learning feature set generation pipelines (200+ TB/day) from having a 4 day latency to having a 1 day latency. While also dropping compute costs for those pipelines 4x. - Reduced core notification data set latencies from 36 hours to < 8 hours. - Migrated 50% of notifications pipelines from using Hive to use Spark, Presto, or real-time streaming. - Cut compute cost from notifications pipelines by 40% over the course of 9 months.
- Skills: Big Data · Apache Spark · SQL · Machine Learning · Data Visualization · Data Analysis · Java · Python · JavaScript · React.js · Hadoop · MapReduce · Linux · Git · Googling
No comments:
Post a Comment