openGPMP
Open Source Mathematics Package
Public Member Functions | Public Attributes | List of all members
gpmp::ml::BayesMultiNom Class Reference

#include <bayes_clf.hpp>

Public Member Functions

 BayesMultiNom (double alpha_param=1.0, bool fit_prior_param=true, const std::vector< double > &class_prior={})
 Constructor for BayesMultiNom class. More...
 
 ~BayesMultiNom ()
 Destructor for BayesMultiNom class. More...
 
void train (const std::vector< std::vector< size_t >> &data, const std::vector< std::string > &labels)
 Train the classifier with a set of labeled data. More...
 
std::string predict (const std::vector< size_t > &new_data) const
 Predict the class of a new data point. More...
 
void display () const
 Display the learned probabilities. More...
 

Public Attributes

double alpha
 Additive smoothing parameter for the Multinomial distribution. More...
 
bool fit_prior
 Flag indicating whether to learn class prior probabilities during training. More...
 
std::unordered_map< std::string, double > class_probs
 Map storing the probabilities of each class. More...
 
std::unordered_map< std::string, std::vector< double > > feature_probs
 Map storing the probabilities of features for each class. More...
 
std::vector< double > class_log_prior
 Vector storing the logarithm of the class prior probabilities. More...
 

Detailed Description

Definition at line 245 of file bayes_clf.hpp.

Constructor & Destructor Documentation

◆ BayesMultiNom()

gpmp::ml::BayesMultiNom::BayesMultiNom ( double  alpha_param = 1.0,
bool  fit_prior_param = true,
const std::vector< double > &  class_prior = {} 
)

Constructor for BayesMultiNom class.

Parameters
alpha_paramAdditive smoothing parameter
fit_prior_paramWhether to learn class prior probabilities or not
class_priorPrior probabilities of the classes

Definition at line 344 of file bayes_clf.cpp.

347  : alpha(alpha_param), fit_prior(fit_prior_param),
348  class_log_prior(class_prior.begin(), class_prior.end()) {
349 }
std::vector< double > class_log_prior
Vector storing the logarithm of the class prior probabilities.
Definition: bayes_clf.hpp:277
bool fit_prior
Flag indicating whether to learn class prior probabilities during training.
Definition: bayes_clf.hpp:257
double alpha
Additive smoothing parameter for the Multinomial distribution.
Definition: bayes_clf.hpp:251

◆ ~BayesMultiNom()

gpmp::ml::BayesMultiNom::~BayesMultiNom ( )

Destructor for BayesMultiNom class.

Definition at line 351 of file bayes_clf.cpp.

351  {
352 }

Member Function Documentation

◆ display()

void gpmp::ml::BayesMultiNom::display ( ) const

Display the learned probabilities.

Note
This method is for debugging purposes

Definition at line 441 of file bayes_clf.cpp.

441  {
442  std::cout << "Class Probabilities:\n";
443  for (const auto &entry : class_probs) {
444  std::cout << entry.first << ": " << entry.second << "\n";
445  }
446 
447  std::cout << "\nFeature Probabilities:\n";
448  for (const auto &class_entry : feature_probs) {
449  std::cout << class_entry.first << ":\n";
450  for (size_t j = 0; j < class_entry.second.size(); ++j) {
451  std::cout << " Feature " << j << ": " << class_entry.second[j]
452  << "\n";
453  }
454  }
455 
456  std::cout << "\nClass Log Priors:\n";
457  for (const auto &log_prior : class_log_prior) {
458  std::cout << log_prior << "\n";
459  }
460 }
std::unordered_map< std::string, std::vector< double > > feature_probs
Map storing the probabilities of features for each class.
Definition: bayes_clf.hpp:271
std::unordered_map< std::string, double > class_probs
Map storing the probabilities of each class.
Definition: bayes_clf.hpp:264

◆ predict()

std::string gpmp::ml::BayesMultiNom::predict ( const std::vector< size_t > &  new_data) const

Predict the class of a new data point.

Parameters
new_dataA vector of size_t representing the features of the new data point
Returns
The predicted class label as a string

Definition at line 420 of file bayes_clf.cpp.

420  {
421  double max_prob = -std::numeric_limits<double>::infinity();
422  std::string predicted_class;
423 
424  for (const auto &entry : class_probs) {
425  const std::string &label = entry.first;
426  double probability = log(entry.second);
427 
428  for (size_t j = 0; j < new_data.size(); ++j) {
429  probability += new_data[j] * log(feature_probs.at(label).at(j));
430  }
431 
432  if (probability > max_prob) {
433  max_prob = probability;
434  predicted_class = label;
435  }
436  }
437 
438  return predicted_class;
439 }

◆ train()

void gpmp::ml::BayesMultiNom::train ( const std::vector< std::vector< size_t >> &  data,
const std::vector< std::string > &  labels 
)

Train the classifier with a set of labeled data.

Parameters
dataA vector of vectors representing the training instances
labelsA vector of strings representing the corresponding class labels

Definition at line 354 of file bayes_clf.cpp.

356  {
357  size_t num_instances = data.size();
358  size_t num_features = data[0].size();
359 
360  // count class occurrences
361  for (const auto &label : labels) {
362  class_probs[label] += 1.0;
363  }
364 
365  // count feature occurrences for each class
366  for (size_t i = 0; i < num_instances; ++i) {
367  const std::string &label = labels[i];
368  const std::vector<size_t> &features = data[i];
369 
370  class_probs[label] += 1.0;
371 
372  // Initialize feature_probs[label] if not present
373  if (feature_probs.find(label) == feature_probs.end()) {
374  feature_probs[label] = std::vector<double>(num_features, 0.0);
375  }
376 
377  for (size_t j = 0; j < num_features; ++j) {
378  feature_probs[label][j] += features[j];
379  }
380  }
381 
382  // calculate class probabilities and feature probabilities
383  double smoothing_factor = alpha * num_features;
384  for (const auto &entry : class_probs) {
385  const std::string &label = entry.first;
386  double class_count = entry.second;
387 
388  // calculate class probability
389  class_probs[label] =
390  (class_count + alpha) / (num_instances + smoothing_factor);
391 
392  // calculate feature probabilities
393  for (size_t j = 0; j < feature_probs[label].size(); ++j) {
394  feature_probs[label][j] = (feature_probs[label][j] + alpha) /
395  (class_count + smoothing_factor);
396  }
397  }
398 
399  // calculate class log priors
400  if (fit_prior) {
401  double total = std::accumulate(
402  class_probs.begin(),
403  class_probs.end(),
404  0.0,
405  [](double sum, const auto &entry) { return sum + entry.second; });
406 
407  for (auto &entry : class_probs) {
408  entry.second /= total;
409  }
410 
411  std::transform(
412  class_probs.begin(),
413  class_probs.end(),
414  class_log_prior.begin(),
415  [total](const auto &entry) { return log(entry.second); });
416  }
417 }

Referenced by main().

Member Data Documentation

◆ alpha

gpmp::ml::BayesMultiNom::alpha

Additive smoothing parameter for the Multinomial distribution.

Definition at line 251 of file bayes_clf.hpp.

◆ class_log_prior

gpmp::ml::BayesMultiNom::class_log_prior

Vector storing the logarithm of the class prior probabilities.

Used for faster computation during prediction

Definition at line 277 of file bayes_clf.hpp.

◆ class_probs

gpmp::ml::BayesMultiNom::class_probs

Map storing the probabilities of each class.

The key is the class label (string), and the value is the probability (double).

Definition at line 264 of file bayes_clf.hpp.

◆ feature_probs

gpmp::ml::BayesMultiNom::feature_probs

Map storing the probabilities of features for each class.

The key is the class label (string), and the value is a vector of feature probabilities.

Definition at line 271 of file bayes_clf.hpp.

◆ fit_prior

gpmp::ml::BayesMultiNom::fit_prior

Flag indicating whether to learn class prior probabilities during training.

Definition at line 257 of file bayes_clf.hpp.


The documentation for this class was generated from the following files: