openGPMP
Open Source Mathematics Package
Public Member Functions | Public Attributes | Private Member Functions | List of all members
gpmp::ml::BayesGauss Class Reference

#include <bayes_clf.hpp>

Public Member Functions

 BayesGauss ()=default
 Constructor for BayesGauss class. More...
 
 ~BayesGauss ()=default
 Destructor for BayesGauss class. More...
 
void train (const std::vector< std::vector< double >> &data, const std::vector< std::string > &labels)
 Train the classifier with a set of labeled data. More...
 
std::string predict (const std::vector< double > &newData) const
 Predict the class of a new data point. More...
 
void display () const
 Display the learned probabilities. More...
 

Public Attributes

std::unordered_map< std::string, double > class_probs
 Map storing the probabilities of each class. More...
 
std::unordered_map< std::string, std::vector< double > > mean
 Map storing the mean values for each feature in each class. More...
 
std::unordered_map< std::string, std::vector< double > > variance
 Map storing the variance values for each feature in each class. More...
 

Private Member Functions

void mean_and_var (const std::vector< std::vector< double >> &data, const std::vector< std::string > &labels)
 Calculate the mean and variance for each class. More...
 

Detailed Description

Classification technique that assumes that each parameter (also called features or predictors) has an independent capacity of predicting the output variable.

Definition at line 172 of file bayes_clf.hpp.

Constructor & Destructor Documentation

◆ BayesGauss()

gpmp::ml::BayesGauss::BayesGauss ( )
default

Constructor for BayesGauss class.

◆ ~BayesGauss()

gpmp::ml::BayesGauss::~BayesGauss ( )
default

Destructor for BayesGauss class.

Member Function Documentation

◆ display()

void gpmp::ml::BayesGauss::display ( ) const

Display the learned probabilities.

Note
This method is for debugging purposes

Definition at line 323 of file bayes_clf.cpp.

323  {
324  std::cout << "Class Probabilities:\n";
325  for (const auto &entry : class_probs) {
326  std::cout << entry.first << ": " << entry.second << "\n";
327  }
328 
329  std::cout << "\nMean and Variance:\n";
330  for (const auto &class_entry : mean) {
331  std::cout << class_entry.first << ":\n";
332  std::cout << " Mean: ";
333  for (size_t j = 0; j < class_entry.second.size(); ++j) {
334  std::cout << class_entry.second[j] << " ";
335  }
336  std::cout << "\n Variance: ";
337  for (size_t j = 0; j < variance.at(class_entry.first).size(); ++j) {
338  std::cout << variance.at(class_entry.first).at(j) << " ";
339  }
340  std::cout << "\n";
341  }
342 }
std::unordered_map< std::string, std::vector< double > > mean
Map storing the mean values for each feature in each class.
Definition: bayes_clf.hpp:187
std::unordered_map< std::string, double > class_probs
Map storing the probabilities of each class.
Definition: bayes_clf.hpp:180
std::unordered_map< std::string, std::vector< double > > variance
Map storing the variance values for each feature in each class.
Definition: bayes_clf.hpp:194

Referenced by main().

◆ mean_and_var()

void gpmp::ml::BayesGauss::mean_and_var ( const std::vector< std::vector< double >> &  data,
const std::vector< std::string > &  labels 
)
private

Calculate the mean and variance for each class.

Parameters
dataA vector of vectors representing the training instances
labelsA vector of strings representing the corresponding class labels

Definition at line 245 of file bayes_clf.cpp.

247  {
248  size_t num_features = data[0].size();
249 
250  for (size_t i = 0; i < data.size(); ++i) {
251  const std::string &label = labels[i];
252  const std::vector<double> &features = data[i];
253 
254  class_probs[label] += 1.0;
255 
256  // initialize mean[label] and variance[label] if not present
257  if (mean.find(label) == mean.end()) {
258  mean[label] = std::vector<double>(num_features, 0.0);
259  variance[label] = std::vector<double>(num_features, 0.0);
260  }
261 
262  // update mean
263  for (size_t j = 0; j < num_features; ++j) {
264  mean[label][j] += features[j];
265  }
266  }
267 
268  // calculate mean
269  for (auto &entry : mean) {
270  const std::string &label = entry.first;
271  double class_count = class_probs[label];
272 
273  for (size_t j = 0; j < num_features; ++j) {
274  entry.second[j] /= class_count;
275  }
276  }
277 
278  // calculate variance
279  for (size_t i = 0; i < data.size(); ++i) {
280  const std::string &label = labels[i];
281  const std::vector<double> &features = data[i];
282 
283  for (size_t j = 0; j < num_features; ++j) {
284  variance[label][j] += std::pow(features[j] - mean[label][j], 2);
285  }
286  }
287 
288  for (auto &entry : variance) {
289  const std::string &label = entry.first;
290  double class_count = class_probs[label];
291 
292  for (size_t j = 0; j < num_features; ++j) {
293  entry.second[j] /= class_count;
294  }
295  }
296 }

◆ predict()

std::string gpmp::ml::BayesGauss::predict ( const std::vector< double > &  newData) const

Predict the class of a new data point.

Parameters
newDataA vector of doubles representing the features of the new data point
Returns
The predicted class label as a string

Definition at line 299 of file bayes_clf.cpp.

299  {
300  double max_prob = -std::numeric_limits<double>::infinity();
301  std::string predicted_class;
302 
303  for (const auto &entry : class_probs) {
304  const std::string &label = entry.first;
305  double probability = log(entry.second);
306 
307  for (size_t j = 0; j < newData.size(); ++j) {
308  probability -=
309  0.5 * (std::log(2 * M_PI * variance.at(label).at(j)) +
310  std::pow(newData[j] - mean.at(label).at(j), 2) /
311  (2 * variance.at(label).at(j)));
312  }
313 
314  if (probability > max_prob) {
315  max_prob = probability;
316  predicted_class = label;
317  }
318  }
319 
320  return predicted_class;
321 }

Referenced by main().

◆ train()

void gpmp::ml::BayesGauss::train ( const std::vector< std::vector< double >> &  data,
const std::vector< std::string > &  labels 
)

Train the classifier with a set of labeled data.

Parameters
dataA vector of vectors representing the training instances
labelsA vector of strings representing the corresponding class labels

Definition at line 228 of file bayes_clf.cpp.

229  {
230  // calculate class occurrences
231  for (const auto &label : labels) {
232  class_probs[label] += 1.0;
233  }
234 
235  // calculate mean and variance for each feature in each class
236  mean_and_var(data, labels);
237 
238  // calculate class probabilities
239  double total_instances = static_cast<double>(data.size());
240  for (auto &entry : class_probs) {
241  entry.second /= total_instances;
242  }
243 }
void mean_and_var(const std::vector< std::vector< double >> &data, const std::vector< std::string > &labels)
Calculate the mean and variance for each class.
Definition: bayes_clf.cpp:245

Referenced by main().

Member Data Documentation

◆ class_probs

gpmp::ml::BayesGauss::class_probs

Map storing the probabilities of each class.

The key is the class label (string), and the value is the probability (double)

Definition at line 180 of file bayes_clf.hpp.

◆ mean

gpmp::ml::BayesGauss::mean

Map storing the mean values for each feature in each class.

The key is the class label (string), and the value is a vector of mean values

Definition at line 187 of file bayes_clf.hpp.

◆ variance

gpmp::ml::BayesGauss::variance

Map storing the variance values for each feature in each class.

The key is the class label (string), and the value is a vector of variance values

Definition at line 194 of file bayes_clf.hpp.


The documentation for this class was generated from the following files: