Statistical patch to gnuplot
The complete documentation of the new stats command
The `stats` command calculates basic summary statistics for a data set,
displays them in human-readable form and (optionally) makes them available
as gnuplot variables.
Syntax:
stats {<ranges>}
{"<datafile>" {datafile-modifiers}}
{[no]output} {variables[=prefix]}
Permissible data file modifiers are `index`, `every`, and `using`, all of
which behave exactly as for the `plot` command. Up to two columns can be
specified with `using`, and inline transformations are available (same as
for `plot`).
The `stats` command will only consider data points which fall into the
plot range as defined inline or using `set xrange` and `set yrange`. If a
value in either column falls outside of its corresponding range, the entire
record is skipped and does not contribute to any of the summary statistics.
By default, the results will be printed to the screen, or to the destination
specified using the `set print` option. Output can be supressed using the
`nooutput` directive. (This can be useful when only assignment to variables
is desired - see below.) If gnuplot detects output to a non-interactive
terminal, output is formatted in a way that is intended to be easy to parse
by another program (name/value pairs).
The results of the calculation can be assigned to user-defined variables in
the current gnuplot session using the `variables` directive. The `variables`
directive can take an optional prefix after an equality sign. If such a
prefix is found, it is prepended to the names of the variables in the
current session. Unless the `variables` keyword is found (with or without
a prefix specification), no assignment to variables is made.
Quantities calculated (and their variable names, without prefix):
records : number of valid records found
invalid : number of invalid records found
blank : number of blank lines found
blocks : number of data blocks in the file (separated by double blank lines)
mean_* : mean
stddev_* : standard deviation
sumx_* : sum of all values
sumx2_* : sum of the squares of all values
min_* : minimal value
min_pos_* : the position of the minimal value
lo_quartile_* : lower quartile
median_* : median
up_quartile_* : upper quartile
max_* : maximum value
max_pos_* : the position of the maximum value
In the variable names, the `*` is replaced by `x` (for the first or sole
column) or `y` (for the second column).
For min, max, median, and quartiles, the `stats` command also reports on
the position in the file at which the value was found. In the corresponding
variables, the `*` is replaced by `pos_x` or `pos_y`. Note: the value
reported in this way is the number of the record in the data set. This is
not necessarily the same as the line number in the data file if the file
contains comment lines, blank lines, or invalid or unreadable records!
Furthermore, gnuplot silently skips invalid records, unless an explicit
`using` directive with parenthesized columns has been issued like this:
`using ($1):($2)`. With a using directive such as: `using 1:2`, the number
of invalid records reported by the `stats` command will always be zero.
(See the section on `plot using` for more details.)
If two columns have been specified with the `using` directive, then the
following additional quantities are calculated:
slope : slope in a linear regression model
intercept : intercept in a linear regression model
correlation : linear correlation coefficient
sumxy : sum of x and y values ('dot product')
All variables and their values can be seen using the `show variables all`
The `stats` command is not available in polar or parametric mode, or when
logarithmic axes are in effect.
Examples:
stats 'data' out
stats 'data' noout var
stats 'data' index 0 using 1:2 out
stats [1:10] 'data' using 1 every ::1::12
stats [0:10] 'data1' using ($1*$1) noout var=dat1
If the results have been assigned to variables, then these variables can
be used in subsequent `plot` or other commands:
stats 'data' using 1:2 noout var
plot 'data' using 1:( ($2-mean_x)/stddev_x ) w lp
or (showing the original data together with its linear regression):
stats 'data' using 1:2 noout var=d_
plot 'data' using 1:2, d_slope*x + d_intercept
See: `plot` for details on the `index`, `every`, and `using` directives.