- Platform-level software
- Cluster-level infrastructure
- Application-level software
2.1 WAREHOUSE DATA CENTER SYSTEMS STACK
The applications that run on warehouse-scale computers (WSCs) dominate many system design
trade-off decisions. This chapter outlines some of the distinguishing characteristics of software
that runs in large internet services and the system software and tools needed for a complete computing
platform. Here are some terms used to describe the different software layers in a typical
WSC deployment.
• Platform-level software: The common firmware, kernel, operating system distribution,
and libraries expected to be present in all individual servers to abstract the hardware
of a single machine and provide a basic machine abstraction layer.
• Cluster-level infrastructure: The collection of distributed systems software that manages
resources and provides services at the cluster level. Ultimately, we consider these
services as an operating system for a data center. Examples are distributed file systems,
schedulers and remote procedure call (RPC) libraries, as well as programming models
that simplify the usage of resources at the scale of data centers, such as MapReduce
[DG08], Dryad [Isa+07], Hadoop [Hadoo], Sawzall [Pik+05], BigTable [Cha+06],
Dynamo [DeC+07], Dremel [Mel+10], Spanner [Cor+12], and Chubby [Bur06].
• Application-level software: Software that implements a specific service. It is often useful
to further divide application-level software into online services and offline computations,
since they tend to have different requirements. Examples of online services are
Google Search, Gmail, and Google Maps. Offline computations are typically used in
large-scale data analysis or as part of the pipeline that generates the data used in online
services, for example, building an index of the web or processing satellite images to
create map tiles for the online service.
• Monitoring and development software: Software that keeps track of system health and
availability by monitoring application performance, identifying system bottlenecks,
and measuring cluster health.
No comments:
Post a Comment