openGPMP
Open Source Mathematics Package
Public Member Functions | Static Public Member Functions | List of all members
gpmp::stats::Describe Class Reference

A class providing methods for descriptive statistics. More...

#include <describe.hpp>

Public Member Functions

double mean_geo (const std::vector< double > &data)
 Calculates the geometric mean of a given dataset. More...
 
double mean_cubic (const std::vector< double > &data, double p)
 Calculates the cubic generalized mean of a given dataset with a specified power. More...
 
double mean_geo_pow (const std::vector< double > &data, double p)
 Calculates the power geometric mean of a given dataset with a specified power. More...
 
double mean_harmonic (const std::vector< double > &data)
 Calculates the harmonic mean of a given dataset. More...
 
double mean_heronian (const std::vector< double > &data)
 Calculates the Heronian mean of a given dataset. More...
 
double mean_heinz (const std::vector< double > &data)
 Calculates the Heinz mean of a given dataset. More...
 
double mean_lehmer (const std::vector< double > &data, double p)
 Calculates the Lehmer mean of a given dataset with a specified power. More...
 
double Median (std::vector< double > data)
 Calculates the median of a given dataset. More...
 
double avg_abs_dev (const std::vector< double > &data)
 Calculates the average absolute deviation of a given dataset. More...
 
double var_coeff (const std::vector< double > &data)
 Calculates the coefficient of variation of a given dataset. More...
 
double iq_range (const std::vector< double > &data)
 Calculates the interquartile range of a given dataset. More...
 
double percentile (const std::vector< double > &data, double percentile)
 Calculates the specified percentile of a given dataset. More...
 
double range (const std::vector< double > &data)
 Calculates the range of a given dataset. More...
 
double clt (const std::vector< double > &data, int numSamples)
 Calculates the standard error of the mean using the Central Limit Theorem. More...
 
double kurtosis (const std::vector< double > &data, double mean)
 Calculates the kurtosis of a given dataset. More...
 
double lmoment1 (const std::vector< double > &data, double mean)
 Calculates the first L-moment of a given dataset. More...
 
double lmoment2 (const std::vector< double > &data, double mean)
 Calculates the second L-moment of a given dataset. More...
 
double skewness (const std::vector< double > &data, double mean, double stddev)
 Calculates the skewness of a given dataset. More...
 
std::vector< size_t > rank_data (const std::vector< double > &data)
 Ranks the data in ascending order. More...
 
double partial_corr (const std::vector< double > &x, const std::vector< double > &y, const std::vector< double > &z)
 Calculates the partial correlation coefficient between two variables, controlling for a third variable. More...
 
double ppmc (const std::vector< double > &x, const std::vector< double > &y)
 Calculates the Pearson Product-Moment Correlation between two variables. More...
 
double kendalls_tau (const std::vector< double > &x, const std::vector< double > &y)
 Calculates Kendall's Tau Rank Correlation between two variables. More...
 
double spearmans_rho (const std::vector< double > &x, const std::vector< double > &y)
 Calculates Spearman's Rank Correlation between two variables. More...
 

Static Public Member Functions

static double u_stat (const std::vector< double > &sample1, const std::vector< double > &sample2)
 Calculates U statistic given two samples. More...
 
static double mean_arith (const std::vector< double > &data)
 Calculates the arithmetic mean of a given dataset. More...
 
static double stdev (const std::vector< double > &data, double mean)
 Calculates the standard deviation of a given dataset, given the mean. More...
 
static double variance (const std::vector< double > &data, double mean)
 Calculates the variance of a given dataset, given the mean. More...
 

Detailed Description

A class providing methods for descriptive statistics.

Definition at line 44 of file describe.hpp.

Member Function Documentation

◆ avg_abs_dev()

double gpmp::stats::Describe::avg_abs_dev ( const std::vector< double > &  data)

Calculates the average absolute deviation of a given dataset.

Parameters
dataThe input dataset
Returns
The average absolute deviation

Definition at line 138 of file describe.cpp.

138  {
139  double mean = mean_arith(data);
140  double sum = 0.0;
141  for (const auto &value : data) {
142  sum += std::abs(value - mean);
143  }
144  return sum / static_cast<double>(data.size());
145 }
static double mean_arith(const std::vector< double > &data)
Calculates the arithmetic mean of a given dataset.
Definition: describe.cpp:52

◆ clt()

double gpmp::stats::Describe::clt ( const std::vector< double > &  data,
int  numSamples 
)

Calculates the standard error of the mean using the Central Limit Theorem.

This method estimates the standard error of the mean based on the Central Limit Theorem

Parameters
dataThe input dataset
numSamplesThe number of samples for the Central Limit Theorem
Returns
The estimated standard error of the mean

Definition at line 204 of file describe.cpp.

205  {
206  double mean = mean_arith(data);
207  double stddev = stdev(data, mean);
208  return stddev / std::sqrt(static_cast<double>(numSamples));
209 }
static double stdev(const std::vector< double > &data, double mean)
Calculates the standard deviation of a given dataset, given the mean.
Definition: describe.cpp:184

◆ iq_range()

double gpmp::stats::Describe::iq_range ( const std::vector< double > &  data)

Calculates the interquartile range of a given dataset.

Parameters
dataThe input dataset
Returns
The interquartile range

Definition at line 155 of file describe.cpp.

155  {
156  std::vector<double> sortedData = data;
157  std::sort(sortedData.begin(), sortedData.end());
158 
159  size_t size = sortedData.size();
160  size_t lowerIndex = size / 4;
161  size_t upperIndex = 3 * size / 4;
162 
163  return sortedData[upperIndex] - sortedData[lowerIndex];
164 }

◆ kendalls_tau()

double gpmp::stats::Describe::kendalls_tau ( const std::vector< double > &  x,
const std::vector< double > &  y 
)

Calculates Kendall's Tau Rank Correlation between two variables.

This method measures the strength and direction of monotonic relationships between variables X and Y

Parameters
xThe values of variable X
yThe values of variable Y
Returns
Kendall's Tau Rank Correlation coefficient

Definition at line 300 of file describe.cpp.

301  {
302  size_t concordant = 0;
303  size_t discordant = 0;
304 
305  for (size_t i = 0; i < x.size() - 1; ++i) {
306  for (size_t j = i + 1; j < x.size(); ++j) {
307  if ((x[i] < x[j] && y[i] < y[j]) || (x[i] > x[j] && y[i] > y[j])) {
308  concordant++;
309  } else if ((x[i] < x[j] && y[i] > y[j]) ||
310  (x[i] > x[j] && y[i] < y[j])) {
311  discordant++;
312  }
313  }
314  }
315 
316  return static_cast<double>(concordant - discordant) /
317  std::sqrt(static_cast<double>((concordant + discordant) *
318  (x.size() * (x.size() - 1)) / 2));
319 }

◆ kurtosis()

double gpmp::stats::Describe::kurtosis ( const std::vector< double > &  data,
double  mean 
)

Calculates the kurtosis of a given dataset.

This method measures the "tailedness" or sharpness of the dataset's peak

Parameters
dataThe input dataset
meanThe mean of the dataset
Returns
The kurtosis value

Definition at line 212 of file describe.cpp.

213  {
214  double sum = 0.0;
215  for (const auto &value : data) {
216  sum += std::pow(value - mean, 4.0);
217  }
218  double var = variance(data, mean);
219  return sum / (data.size() * std::pow(var, 2.0)) - 3.0;
220 }
static double variance(const std::vector< double > &data, double mean)
Calculates the variance of a given dataset, given the mean.
Definition: describe.cpp:194

◆ lmoment1()

double gpmp::stats::Describe::lmoment1 ( const std::vector< double > &  data,
double  mean 
)

Calculates the first L-moment of a given dataset.

L-moments are used to estimate the parameters of a probability distribution

Parameters
dataThe input dataset
meanThe mean of the dataset
Returns
The first L-moment

Definition at line 223 of file describe.cpp.

224  {
225  double sum = 0.0;
226  for (const auto &value : data) {
227  sum += std::pow(value - mean, 3.0);
228  }
229  return sum / data.size();
230 }

◆ lmoment2()

double gpmp::stats::Describe::lmoment2 ( const std::vector< double > &  data,
double  mean 
)

Calculates the second L-moment of a given dataset.

L-moments are used to estimate the parameters of a probability distribution

Parameters
dataThe input dataset
meanThe mean of the dataset
Returns
The second L-moment

Definition at line 232 of file describe.cpp.

233  {
234  double sum = 0.0;
235  for (const auto &value : data) {
236  sum += std::pow(value - mean, 4.0);
237  }
238  return sum / data.size();
239 }

◆ mean_arith()

double gpmp::stats::Describe::mean_arith ( const std::vector< double > &  data)
static

Calculates the arithmetic mean of a given dataset.

Parameters
dataThe input dataset
Returns
The arithmetic mean

Definition at line 52 of file describe.cpp.

52  {
53  double sum = 0.0;
54  for (const auto &value : data) {
55  sum += value;
56  }
57  return sum / static_cast<double>(data.size());
58 }

Referenced by gpmp::stats::HypothesisTest::ANOVA(), gpmp::stats::ProbDist::ConfidenceInterval(), gpmp::stats::ProbDist::mom(), gpmp::stats::HypothesisTest::one_sample_ttest(), gpmp::stats::ProbDist::PivotFunctionForConfidenceInterval(), and gpmp::stats::HypothesisTest::two_sample_ttest().

◆ mean_cubic()

double gpmp::stats::Describe::mean_cubic ( const std::vector< double > &  data,
double  p 
)

Calculates the cubic generalized mean of a given dataset with a specified power.

Parameters
dataThe input dataset
pThe power parameter
Returns
The cubic generalized mean

Definition at line 70 of file describe.cpp.

71  {
72  double sum = 0.0;
73  for (const auto &value : data) {
74  sum += std::pow(value, p);
75  }
76  return std::pow(sum / static_cast<double>(data.size()), 1.0 / p);
77 }

◆ mean_geo()

double gpmp::stats::Describe::mean_geo ( const std::vector< double > &  data)

Calculates the geometric mean of a given dataset.

Parameters
dataThe input dataset
Returns
The geometric mean

Definition at line 61 of file describe.cpp.

61  {
62  double product = 1.0;
63  for (const auto &value : data) {
64  product *= value;
65  }
66  return std::pow(product, 1.0 / static_cast<double>(data.size()));
67 }

◆ mean_geo_pow()

double gpmp::stats::Describe::mean_geo_pow ( const std::vector< double > &  data,
double  p 
)

Calculates the power geometric mean of a given dataset with a specified power.

Parameters
dataThe input dataset
pThe power parameter
Returns
The power geometric mean

Definition at line 80 of file describe.cpp.

81  {
82  double product = 1.0;
83  for (const auto &value : data) {
84  product *= std::pow(value, p);
85  }
86  return std::pow(product, 1.0 / static_cast<double>(data.size()));
87 }

◆ mean_harmonic()

double gpmp::stats::Describe::mean_harmonic ( const std::vector< double > &  data)

Calculates the harmonic mean of a given dataset.

Parameters
dataThe input dataset
Returns
The harmonic mean

Definition at line 90 of file describe.cpp.

90  {
91  double sum = 0.0;
92  for (const auto &value : data) {
93  sum += 1.0 / value;
94  }
95  return static_cast<double>(data.size()) / sum;
96 }

◆ mean_heinz()

double gpmp::stats::Describe::mean_heinz ( const std::vector< double > &  data)

Calculates the Heinz mean of a given dataset.

Parameters
dataThe input dataset
Returns
The Heinz mean

Definition at line 108 of file describe.cpp.

108  {
109  double sum = 0.0;
110  for (const auto &value : data) {
111  sum += value * std::log(value);
112  }
113  return std::exp(sum / static_cast<double>(data.size()));
114 }

◆ mean_heronian()

double gpmp::stats::Describe::mean_heronian ( const std::vector< double > &  data)

Calculates the Heronian mean of a given dataset.

Parameters
dataThe input dataset
Returns
The Heronian mean

Definition at line 99 of file describe.cpp.

99  {
100  double product = 1.0;
101  for (const auto &value : data) {
102  product *= std::sqrt(value);
103  }
104  return std::pow(product, 2.0 / static_cast<double>(data.size()));
105 }

◆ mean_lehmer()

double gpmp::stats::Describe::mean_lehmer ( const std::vector< double > &  data,
double  p 
)

Calculates the Lehmer mean of a given dataset with a specified power.

Parameters
dataThe input dataset
pThe power parameter
Returns
The Lehmer mean

Definition at line 117 of file describe.cpp.

118  {
119  double sum = 0.0;
120  for (const auto &value : data) {
121  sum += std::pow(value, p);
122  }
123  return sum / static_cast<double>(data.size());
124 }

◆ Median()

double gpmp::stats::Describe::Median ( std::vector< double >  data)

Calculates the median of a given dataset.

Parameters
dataThe input dataset
Returns
The median

Definition at line 127 of file describe.cpp.

127  {
128  std::sort(data.begin(), data.end());
129  size_t size = data.size();
130  if (size % 2 == 0) {
131  return (data[size / 2 - 1] + data[size / 2]) / 2.0;
132  } else {
133  return data[size / 2];
134  }
135 }

◆ partial_corr()

double gpmp::stats::Describe::partial_corr ( const std::vector< double > &  x,
const std::vector< double > &  y,
const std::vector< double > &  z 
)

Calculates the partial correlation coefficient between two variables, controlling for a third variable.

This method computes the partial correlation between variables X and Y, controlling for variable Z

Parameters
xThe values of variable X
yThe values of variable Y
zThe values of control variable Z
Returns
The partial correlation coefficient

Definition at line 269 of file describe.cpp.

271  {
272  double r_xy = ppmc(x, y);
273  double r_xz = ppmc(x, z);
274  double r_yz = ppmc(y, z);
275 
276  return (r_xy - (r_xz * r_yz)) /
277  std::sqrt((1.0 - std::pow(r_xz, 2.0)) * (1.0 - std::pow(r_yz, 2.0)));
278 }
double ppmc(const std::vector< double > &x, const std::vector< double > &y)
Calculates the Pearson Product-Moment Correlation between two variables.
Definition: describe.cpp:281

◆ percentile()

double gpmp::stats::Describe::percentile ( const std::vector< double > &  data,
double  percentile 
)

Calculates the specified percentile of a given dataset.

Parameters
dataThe input dataset
percentileThe desired percentile (00 to 10)
Returns
The value at the specified percentile

Definition at line 167 of file describe.cpp.

168  {
169  std::vector<double> sortedData = data;
170  std::sort(sortedData.begin(), sortedData.end());
171 
172  size_t size = sortedData.size();
173  size_t index = static_cast<size_t>(percentile * (size - 1));
174  return sortedData[index];
175 }
double percentile(const std::vector< double > &data, double percentile)
Calculates the specified percentile of a given dataset.
Definition: describe.cpp:167

◆ ppmc()

double gpmp::stats::Describe::ppmc ( const std::vector< double > &  x,
const std::vector< double > &  y 
)

Calculates the Pearson Product-Moment Correlation between two variables.

This method measures the linear relationship between variables X and Y

Parameters
xThe values of variable X
yThe values of variable Y
Returns
The Pearson Product-Moment Correlation coefficient

Definition at line 281 of file describe.cpp.

282  {
283  double mean_x = mean_arith(x);
284  double mean_y = mean_arith(y);
285 
286  double numerator = 0.0;
287  double denominator_x = 0.0;
288  double denominator_y = 0.0;
289 
290  for (size_t i = 0; i < x.size(); ++i) {
291  numerator += (x[i] - mean_x) * (y[i] - mean_y);
292  denominator_x += std::pow(x[i] - mean_x, 2.0);
293  denominator_y += std::pow(y[i] - mean_y, 2.0);
294  }
295 
296  return numerator / std::sqrt(denominator_x * denominator_y);
297 }

◆ range()

double gpmp::stats::Describe::range ( const std::vector< double > &  data)

Calculates the range of a given dataset.

Parameters
dataThe input dataset
Returns
The range

Definition at line 178 of file describe.cpp.

178  {
179  auto result = std::minmax_element(data.begin(), data.end());
180  return *result.second - *result.first;
181 }

◆ rank_data()

std::vector< size_t > gpmp::stats::Describe::rank_data ( const std::vector< double > &  data)

Ranks the data in ascending order.

This method assigns ranks to the data, where the smallest value gets rank 1, the second smallest gets rank 2, and so on

Parameters
dataThe input dataset
Returns
A vector containing the ranks of the input data

Definition at line 253 of file describe.cpp.

253  {
254  std::vector<size_t> ranks(data.size());
255 
256  for (size_t i = 0; i < data.size(); ++i) {
257  size_t rank = 1;
258  for (size_t j = 0; j < data.size(); ++j) {
259  if (j != i && data[j] < data[i]) {
260  rank++;
261  }
262  }
263  ranks[i] = rank;
264  }
265 
266  return ranks;
267 }

◆ skewness()

double gpmp::stats::Describe::skewness ( const std::vector< double > &  data,
double  mean,
double  stddev 
)

Calculates the skewness of a given dataset.

This method measures the asymmetry of the dataset's distribution

Parameters
dataThe input dataset
meanThe mean of the dataset
stddevThe standard deviation of the dataset
Returns
The skewness value

Definition at line 242 of file describe.cpp.

244  {
245  double sum = 0.0;
246  for (const auto &value : data) {
247  sum += std::pow((value - mean) / stddev, 3.0);
248  }
249  return sum / static_cast<double>(data.size());
250 }

◆ spearmans_rho()

double gpmp::stats::Describe::spearmans_rho ( const std::vector< double > &  x,
const std::vector< double > &  y 
)

Calculates Spearman's Rank Correlation between two variables.

This method measures the strength and direction of monotonic relationships between variables X and Y

Parameters
xThe values of variable X
yThe values of variable Y
Returns
Spearman's Rank Correlation coefficient

Definition at line 322 of file describe.cpp.

323  {
324  std::vector<size_t> ranks_x = rank_data(x);
325  std::vector<size_t> ranks_y = rank_data(y);
326 
327  double d_squared = 0.0;
328  for (size_t i = 0; i < x.size(); ++i) {
329  d_squared += std::pow(ranks_x[i] - ranks_y[i], 2.0);
330  }
331 
332  return 1.0 -
333  (6.0 * d_squared) / (x.size() * (std::pow(x.size(), 2.0) - 1.0));
334 }
std::vector< size_t > rank_data(const std::vector< double > &data)
Ranks the data in ascending order.
Definition: describe.cpp:253

◆ stdev()

double gpmp::stats::Describe::stdev ( const std::vector< double > &  data,
double  mean 
)
static

Calculates the standard deviation of a given dataset, given the mean.

Parameters
dataThe input dataset
meanThe mean of the dataset
Returns
The standard deviation

Definition at line 184 of file describe.cpp.

185  {
186  double sum = 0.0;
187  for (const auto &value : data) {
188  sum += std::pow(value - mean, 2.0);
189  }
190  return std::sqrt(sum / static_cast<double>(data.size()));
191 }

Referenced by gpmp::stats::ProbDist::ConfidenceInterval(), gpmp::stats::HypothesisTest::one_sample_ttest(), and gpmp::stats::ProbDist::PivotFunctionForConfidenceInterval().

◆ u_stat()

double gpmp::stats::Describe::u_stat ( const std::vector< double > &  sample1,
const std::vector< double > &  sample2 
)
static

Calculates U statistic given two samples.

Parameters
sample1
sample2
Returns
U statistic

Definition at line 38 of file describe.cpp.

39  {
40  double U = 0;
41  for (double x1 : sample1) {
42  for (double x2 : sample2) {
43  if (x1 < x2) {
44  U++;
45  }
46  }
47  }
48  return U;
49 }

Referenced by gpmp::stats::HypothesisTest::mann_whitney_test().

◆ var_coeff()

double gpmp::stats::Describe::var_coeff ( const std::vector< double > &  data)

Calculates the coefficient of variation of a given dataset.

Parameters
dataThe input dataset
Returns
The coefficient of variation (in percentage)

Definition at line 148 of file describe.cpp.

148  {
149  double mean = mean_arith(data);
150  double stddev = stdev(data, mean);
151  return (stddev / mean) * 100.0; // Multiply by 100 for percentage
152 }

◆ variance()

double gpmp::stats::Describe::variance ( const std::vector< double > &  data,
double  mean 
)
static

Calculates the variance of a given dataset, given the mean.

Parameters
dataThe input dataset
meanThe mean of the dataset
Returns
The variance

Definition at line 194 of file describe.cpp.

195  {
196  double sum = 0.0;
197  for (const auto &value : data) {
198  sum += std::pow(value - mean, 2.0);
199  }
200  return sum / static_cast<double>(data.size());
201 }

Referenced by gpmp::stats::ProbDist::mom(), and gpmp::stats::HypothesisTest::two_sample_ttest().


The documentation for this class was generated from the following files: