Wednesday, October 13, 2021

Garth Gibson | Professor of CMU | My GFS study

Summary Biograph - From the webpage http://www.cs.cmu.edu/~garth/

 I joined the faculty of CMU's Computer Science Department in 1991. Previously I received a Ph.D. and a M.Sc. in Computer Science in 1991 and 1987, respectively, from the University of California at Berkeley. Prior to Berkeley, I received a Bachelor of Mathematics in Computer Science and Applied Mathematics in 1983 from the University of Waterloo in Ontario, Canada.

    In 1993 I founded CMU's Parallel Data Laboratory (PDL) and led it until April 1999. Today the PDL is led by Greg Ganger. The PDL is a community that typically comprises between 6 to 9 faculty, 2 to 3 dozen students and 4 to 10 staff. It receives support and guidance from a consortium of 15 to 25 companies with interests in parallel data systems, the Parallel Data Consortium. This community holds biannual retreats and workshops to exchange technology ideas, analysis and future directions. The publications of the PDL are available for your inspection.

    The principal contributions of my first twenty years of research: Redundant Arrays of Inexpensive Disks (RAID), Informed Prefetching and Caching (TIP) and Network-Attached Secure Disks (NASD), whose architectural basis shapes the Google File System and its descendents such as the Hadoop Distributed File System (HDFS) and the Parallel Network File System, pNFS, features in NFS v4.1 (video discussion), have all stimulated derivative research and development in academia and industry. RAID, in particular, is now the organizing concept of a 10+ billion-dollar marketplace (more on RAID in my 1995 RAID tutorial).

    In 1999 I started Panasas Inc., a scalable storage cluster company using an object storage architecture and providing 100s of TB of high-performance storage in a single management domain for national laboratory, energy sector, auto/aero-design, life sciences, financial modeling, digital animation, and engineering design markets (USENIX FAST08PDSW07SC04).

    In 2006 I founded a Petascale Data Storage Institute (PDSI) for the Department of Energy's Scientific Discovery through Advanced Computing (SciDAC). Led by CMU, with partners at Los Alamos, Sandia, Oak Ridge, Pacific Northwest and Lawrence Berkeley National Labs, and University of California, Santa Cruz and University of Michigan, Ann Arbor, this Institute gathers together leading experts in leadership class supercomputing storage systems to address the challenges involved in moving from today's terascale computers to the petascale computers of the next decade. PDSI has run its course, leaving ongoing collaboration among the community at the annual Parallel Data Storage Workshop (PDSW), between Los Alamos National Laboratory and CMU (IRHPIT), and an open source parallel checkpoint middleware file system (PLFS).

    In 2008 I turned to Data Intensive Scalable Computing, Clouds, and Scalable Analytics, participating in the design and installation of 2 TF, 2TB, 1/2PB of computing in an OpenCirrus and an OpenCloud cluster. We installed and operate a Hadoop cluster for any and all researchers at CMU and have published observations on their use of this facility and benchmarking tools for it. Astrophysics was a strong early user, computational biology and geophysics filling out a natural science slate, but the heaviest users were doing variants of machine learning and big data and the major collaboration has been the Intel Science and Technology Center for Cloud Computing (ISTC-CC).

    In 2011 I helped the New Mexico Consortium recycle retired Los Alamos National Laboratory supercomputing clusters into an NSF funded open platform for scalable systems development and testing (PRObE). PRObE offers multiple clusters with 1000s of cores in either low-core-count high-node-count clusters or high-core-count low-node-count clusters. Researchers at universities and labs from all around the country are using PRObE to demonstrate the scalability of their systems research ideas.

    In 2012 I rallied a team of Machine Learning and Distributed Systems researchers to form a Big Learning research group. Our premise is that Machine Learning on Big Data presents both theoretical (exploitation of the inherent search-iness of machine learning and ensuring convergence given concurrency induced error) and a practical (distributed systems latency hiding and load balancing given unusually flexible tolerance for bounded error) challenges.

    Also in 2012 I created the curriculum for and welcomed the first class of Big Data Systems masters students, now known as the Systems Major in the Master of Computational Data Sciences (MCDS). MCDS graduates are typically employed in the U.S. tech industry, earning an average salary of over $115,000 in their first post-MCDS job.

     

No comments:

Post a Comment