- Building the most secure, durable Hardware Security Module (HSM) in the Cloud, on the Edge, available everywhere, anytime, that scales infinitely.Building the most secure, durable Hardware Security Module (HSM) in the Cloud, on the Edge, available everywhere, anytime, that scales infinitely.
- 2020 -2021
We continue to make KMS securer, faster, easier to use and more scalable. We built a new team dedicated to Data, Analytic and Observability that leverages AWS Big Data/Machine Learning technologies to raise the bar on KMS' business and operational excellence. We introduced real-time Online Event Processing (OLEP) and Online Analytic Processing (OLAP) into KMS architecture. We improved KMS' public facing availability SLA from 99.9 to 99.999%. We dropped KMS' data plane API' P99 latency by 50%! We fully automated KMS' host provisioning and secret management process, reducing hundreds of engineer/hours per year spent on host and fleet management. We deployed real-time Formal Verification mechanisms to assert the correctness and completeness of KMS's key life cycle. On April 2nd, 2021, KMS removed the Grants Per Grantee Principal (GPGP) quota for all customers in all regions, eliminating a common barrier to adoption for customers who want to encrypt a large number of resources under a single KMS key.
2019
What a year! In June 2019, KMS was launched in China. In Nov 2019, KMS launched asymmetric customer master keys (CMKs) and data key pairs. We raised the KMS read-only control plane API limits from 5/15 to 100 requests per seconds. We further increased the KMS data plane API limits in large regions to 30,000 requests per seconds (comparing to 3 years ago when I started in KMS, this was almost 30 times higher!). We continued to improve the availability and latency of KMS API calls, sometimes dropping latencies by 5 - 6 times. We shipped the KMS API usage metric to help customers monitor how they use KMS API from CloudWatch at semi real time. Four team members got promoted to the next level, 2 of them to senior SDE positions. If you know about AWS, senior SDE promotion is a big deal, a major milestone in engineers' career. We also recruited 5 new team members. I can't wait to get started in 2020.2020 -2021 We continue to make KMS securer, faster, easier to use and more scalable. We built a new team dedicated to Data, Analytic and Observability that leverages AWS Big Data/Machine Learning technologies to raise the bar on KMS' business and operational excellence. We introduced real-time Online Event Processing (OLEP) and Online Analytic Processing (OLAP) into KMS architecture. We improved KMS' public facing availability SLA from 99.9 to 99.999%. We dropped KMS' data plane API' P99 latency by 50%! We fully automated KMS' host provisioning and secret management process, reducing hundreds of engineer/hours per year spent on host and fleet management. We deployed real-time Formal Verification mechanisms to assert the correctness and completeness of KMS's key life cycle. On April 2nd, 2021, KMS removed the Grants Per Grantee Principal (GPGP) quota for all customers in all regions, eliminating a common barrier to adoption for customers who want to encrypt a large number of resources under a single KMS key. 2019 What a year! In June 2019, KMS was launched in China. In Nov 2019, KMS launched asymmetric customer master keys (CMKs) and data key pairs. We raised the KMS read-only control plane API limits from 5/15 to 100 requests per seconds. We further increased the KMS data plane API limits in large regions to 30,000 requests per seconds (comparing to 3 years ago when I started in KMS, this was almost 30 times higher!). We continued to improve the availability and latency of KMS API calls, sometimes dropping latencies by 5 - 6 times. We shipped the KMS API usage metric to help customers monitor how they use KMS API from CloudWatch at semi real time. Four team members got promoted to the next level, 2 of them to senior SDE positions. If you know about AWS, senior SDE promotion is a big deal, a major milestone in engineers' career. We also recruited 5 new team members. I can't wait to get started in 2020.
- 2018 -
I am leading a team of talented engineers to build AWS KMS - the most secure, most durable cloud service - in the cloud. My team is focused on scalability, security automation, large scale distributed datastores and performance engineering. KMS is the root of trust of AWS ecosystem and the millions of enterprises that rely on AWS for their core businesses. As a foundational service, KMS has to maintain extremely high security, durability, availability and latency SLA. In 2018, my team increased the KMS default dataplane limits 8-10 times higher globally. We manage KMS fleets that process tens of billions of transactions per day.
2017-2018
I built a new team to focus on security, infrastructure automation and compliance. We completed quite a list of features to improve our security posture and operational excellence.
1. Fully automated certificate rotations - running in AWS scale, this saved hundreds of hours of manual labors
2. Through process optimization and automations, make the KMS region build 10 times faster than last year.
3. Deploy FIPS 140-2 Compliant Endpoints globally - we are the first AWS service to support FIPS Compliant Endpoints in all commercial regions
4. A sleek new dashboard framework based on React
5. A fully automated pipeline to deploy alarms
5. Continuous Deployments in multiple key pipelines2018 - I am leading a team of talented engineers to build AWS KMS - the most secure, most durable cloud service - in the cloud. My team is focused on scalability, security automation, large scale distributed datastores and performance engineering. KMS is the root of trust of AWS ecosystem and the millions of enterprises that rely on AWS for their core businesses. As a foundational service, KMS has to maintain extremely high security, durability, availability and latency SLA. In 2018, my team increased the KMS default dataplane limits 8-10 times higher globally. We manage KMS fleets that process tens of billions of transactions per day. 2017-2018 I built a new team to focus on security, infrastructure automation and compliance. We completed quite a list of features to improve our security posture and operational excellence. 1. Fully automated certificate rotations - running in AWS scale, this saved hundreds of hours of manual labors 2. Through process optimization and automations, make the KMS region build 10 times faster than last year. 3. Deploy FIPS 140-2 Compliant Endpoints globally - we are the first AWS service to support FIPS Compliant Endpoints in all commercial regions 4. A sleek new dashboard framework based on React 5. A fully automated pipeline to deploy alarms 5. Continuous Deployments in multiple key pipelines
- Software Development ManagerSoftware Development ManagerAmazonAmazonJul 2015 - Dec 2016 · 1 yr 6 mosJul 2015 - Dec 2016 · 1 yr 6 mosGreater Seattle AreaGreater Seattle Area
- I built a two pizza team from scratch to focus on customer facing security controls. Security primitives are hard to build and easy to break. It is essential we build them correctly and reuse them as much as possible. On top of that, building security primitives that can serve hundreds of millions of customers is a daunting task by itself. In this period I focused on two things: a composable authentication framework that can abstract security controls into reusable widgets; hooking security controls into machine learning engines to adapt authentication strength based on risks of the transactions. We followed the pure functional design philosophy, we proved functional design can be applied to this kind of scale. By leveraging the data feedbacks from the machine learning engines, we can protect high risk transactions without introducing frictions to most of the customers, therefore we keep the delicate balance of security and user experience.
I launched a long list of features in this period, here are the ones I am the most proud of:
1. The composable authentication framework to turn customer facing security primitives into reusable widgets that could be composed dynamically into different contexts and devices.
2. OTP based password reset for all Amazon customers, significantly improve the security and usability of the password reset workflow.
3. The risk driven OTP for sign in, protecting high risk sign in transactions with the second factor authentication without introducing frictions to the majority of the customers.
4. The risk driven CAPTCHA eliminated the frictions for millions of customers every day.
5. Mobile number gating in China and India for real name validation - a critical enabler to meet the security regulation of the regional markets.
I built a two pizza team from scratch to focus on customer facing security controls. Security primitives are hard to build and easy to break. It is essential we build them correctly and reuse them as much as possible. On top of that, building security primitives that can serve hundreds of millions of customers is a daunting task by itself. In this period I focused on two things: a composable authentication framework that can abstract security controls into reusable widgets; hooking security controls into machine learning engines to adapt authentication strength based on risks of the transactions. We followed the pure functional design philosophy, we proved functional design can be applied to this kind of scale. By leveraging the data feedbacks from the machine learning engines, we can protect high risk transactions without introducing frictions to most of the customers, therefore we keep the delicate balance of security and user experience. I launched a long list of features in this period, here are the ones I am the most proud of: 1. The composable authentication framework to turn customer facing security primitives into reusable widgets that could be composed dynamically into different contexts and devices. 2. OTP based password reset for all Amazon customers, significantly improve the security and usability of the password reset workflow. 3. The risk driven OTP for sign in, protecting high risk sign in transactions with the second factor authentication without introducing frictions to the majority of the customers. 4. The risk driven CAPTCHA eliminated the frictions for millions of customers every day. 5. Mobile number gating in China and India for real name validation - a critical enabler to meet the security regulation of the regional markets.
- Engineering Manager, SaaS Service Developement/Principal Engineer/ArchitectEngineering Manager, SaaS Service Developement/Principal Engineer/ArchitectPing IdentityPing IdentityNov 2011 - Jun 2015 · 3 yrs 8 mosNov 2011 - Jun 2015 · 3 yrs 8 mosGreater Denver AreaGreater Denver Area
- Highlights of my recent projects:
1. Multi-region data routing and replication infrastructure to support PII control in multi-tenant Cloud environment.
2. Unified User Management based on Virtual Directory architecture that accommodates millions of users and plugin services.
3. Hierarchical Multi-tenant Management and Multi-tenant Role Based Access Control with Feature Flagging
4. Multi-tenant API Management Framework based on OpenID Connect
No comments:
Post a Comment