This article will address Sawzall (programming language) from different perspectives, in order to offer readers a comprehensive and detailed view on this topic. Relevant aspects will be analyzed, relevant data will be presented and various opinions from experts in the field will be offered. Sawzall (programming language) is a topic that arouses great interest and curiosity in today's society, so it is essential to delve into its study to understand its importance and impact in different areas. Throughout this article, different facets of Sawzall (programming language) will be explored, with the purpose of providing readers with a complete and enriching overview of this topic.
This article needs additional citations for verification. (April 2011) |
Developer | |
---|---|
First appeared | 2003 |
License | Apache License 2.0 |
Website | code |
Sawzall is a procedural domain-specific programming language, used by Google to process large numbers of individual log records. Sawzall was first described in 2003,[1] and the szl runtime was open-sourced in August 2010.[2] However, since the MapReduce table aggregators have not been released,[3] the open-sourced runtime is not useful for large-scale data analysis of multiple log files off the shelf. Sawzall has been replaced by Lingo (logs in Go) for most purposes within Google.[4]
Google's server logs are stored as large collections of records (Protocol Buffers) that are partitioned over many disks within GFS. In order to perform calculations involving the logs, engineers can write MapReduce programs in C++ or Java. MapReduce programs need to be compiled and may be more verbose than necessary, so writing a program to analyze the logs can be time-consuming. To make it easier to write quick scripts, Rob Pike et al. developed the Sawzall language. A Sawzall script runs within the Map phase of a MapReduce and "emits" values to tables. Then the Reduce phase (which the script writer does not have to be concerned about) aggregates the tables from multiple runs into a single set of tables.
Currently, only the language runtime (which runs a Sawzall script once over a single input) has been open-sourced; the supporting program built on MapReduce has not been released.[3]
Some interesting features include:
collection
saves every value emittedsum
saves the sum of every emitted valuemaximum(n)
saves only the highest n values on a given weight.sample(n)
gives a random sample of n values from all the emitted valuesquantile(n)
calculates a cumulative probability distribution of the given numbers.top(n)
gives n values that are probably the most frequent of the emitted values.unique(n)
estimates the number of unique values emitted.Sawzall's design favors efficiency and engine simplicity over power:
This complete Sawzall program will read the input and produce three results: the number of records, the sum of the values, and the sum of the squares of the values.
count: table sum of int; total: table sum of float; sum_of_squares: table sum of float; x: float = input; emit count <- 1; emit total <- x; emit sum_of_squares <- x * x;