openGPMP
Open Source Mathematics Package
|
#include <linreg.hpp>
Public Member Functions | |
LinearRegression () | |
Constructor for LinearRegression. More... | |
void | calculate_coeffecient () |
Calculates the coefficient/slope of the best fitting line. More... | |
void | calculate_constant () |
Calculate the constant term of the best fitting line. More... | |
int64_t | data_size () |
Get the number of entries (xi, yi) in the data set. More... | |
long double | return_coeffecient () |
Get the coefficient/slope of the best fitting line. More... | |
long double | return_constant () |
Get the constant term of the best fitting line. More... | |
void | best_fit () |
Calculates and displays the best fitting line based on training data. More... | |
void | get_input (const std::vector< long double > &x_data, const std::vector< long double > &y_data) |
Sets the input data for the LinearRegression class from two vectors. More... | |
void | get_input (const gpmp::core::DataTableStr &data, const std::vector< std::string > &columns) |
Takes input data in the form of a DataTable and prepares it for regression analysis. More... | |
void | get_input (const char *file) |
Takes input data from a file and prepares it for regression analysis. More... | |
void | split_data (double test_size, unsigned int seed, bool shuffle) |
Splits the data into training and testing sets. More... | |
void | show_data () |
Display the data set. More... | |
long double | predict (long double _x) const |
Predict a value based on the input. More... | |
long double | predict (long double _x, const std::vector< long double > &x_data) |
Predict a value based on the input. More... | |
long double | error_in (long double num) |
Calculates the error (residual) for a given independent variable value. More... | |
long double | error_in (long double num, const std::vector< long double > &x_data, const std::vector< long double > &y_data) |
Calculates the error (residual) for a given independent variable value using a dataset. More... | |
long double | error_square () |
Calculates the sum of squared errors for the entire dataset. More... | |
long double | mse (const std::vector< long double > &x_data, const std::vector< long double > &y_data) const |
Calculates the Mean Squared Error (MSE) for a dataset. More... | |
long double | r_sqrd (const std::vector< long double > &x_data, const std::vector< long double > &y_data) const |
Calculate the coefficient of determination (R-squared). More... | |
int64_t | num_rows (const char *input) |
Calculate the number of rows in a file. More... | |
Public Attributes | |
std::vector< long double > | x |
std::vector< long double > | y |
long double | coeff |
long double | constant |
long double | sum_xy |
long double | sum_x |
long double | sum_y |
long double | sum_x_square |
long double | sum_y_square |
std::vector< long double > | x_train |
std::vector< long double > | y_train |
std::vector< long double > | x_test |
std::vector< long double > | y_test |
Definition at line 53 of file linreg.hpp.
gpmp::ml::LinearRegression::LinearRegression | ( | ) |
Constructor for LinearRegression.
Definition at line 54 of file linreg.cpp.
References coeff, constant, sum_x, sum_x_square, sum_xy, sum_y, and sum_y_square.
void gpmp::ml::LinearRegression::best_fit | ( | ) |
Calculates and displays the best fitting line based on training data.
This function calculates the best fitting line using the training data and displays the result. If training data is empty, it will also handle the case when the coefficients and constants are not calculated.
Definition at line 110 of file linreg.cpp.
References _log_, INFO, gpmp::core::Logger::log(), N, and WARNING.
Referenced by main(), and test_train().
void gpmp::ml::LinearRegression::calculate_coeffecient | ( | ) |
Calculates the coefficient/slope of the best fitting line.
This function calculates the coefficient of the linear regression model by analyzing the dataset.
Definition at line 66 of file linreg.cpp.
References N.
void gpmp::ml::LinearRegression::calculate_constant | ( | ) |
Calculate the constant term of the best fitting line.
Definition at line 80 of file linreg.cpp.
References N.
int64_t gpmp::ml::LinearRegression::data_size | ( | ) |
Get the number of entries (xi, yi) in the data set.
Definition at line 89 of file linreg.cpp.
long double gpmp::ml::LinearRegression::error_in | ( | long double | num | ) |
Calculates the error (residual) for a given independent variable value.
This function computes the difference between the actual dependent variable value (y) and the predicted value based on the linear regression model for a specified independent variable (x).
num | The independent variable value for which the error is calculated. |
Definition at line 373 of file linreg.cpp.
Referenced by main(), and test_train().
long double gpmp::ml::LinearRegression::error_in | ( | long double | num, |
const std::vector< long double > & | x_data, | ||
const std::vector< long double > & | y_data | ||
) |
Calculates the error (residual) for a given independent variable value using a dataset.
This function computes the difference between the actual dependent variable value (y) and the predicted value based on the linear regression model for a specified independent variable (x).
num | The independent variable value for which the error is calculated. |
x_data | The vector of independent variable values. |
y_data | The vector of actual dependent variable values. |
Definition at line 384 of file linreg.cpp.
long double gpmp::ml::LinearRegression::error_square | ( | ) |
Calculates the sum of squared errors for the entire dataset.
This function computes the sum of squared differences between the actual dependent variable values (y) and the predicted values based on the linear regression model for all data points.
Definition at line 398 of file linreg.cpp.
void gpmp::ml::LinearRegression::get_input | ( | const char * | file | ) |
Takes input data from a file and prepares it for regression analysis.
file | The name of the file containing the data. |
Definition at line 253 of file linreg.cpp.
void gpmp::ml::LinearRegression::get_input | ( | const gpmp::core::DataTableStr & | data, |
const std::vector< std::string > & | columns | ||
) |
Takes input data in the form of a DataTable and prepares it for regression analysis.
data | The DataTable containing the data. |
columns | The column names for the independent and dependent variables. |
Definition at line 192 of file linreg.cpp.
References _log_, ERROR, and gpmp::core::Logger::log().
void gpmp::ml::LinearRegression::get_input | ( | const std::vector< long double > & | x_data, |
const std::vector< long double > & | y_data | ||
) |
Sets the input data for the LinearRegression class from two vectors.
This function accepts vectors of independent and dependent variable values and initializes the class variables.
x_data | The vector of independent variable values. |
y_data | The vector of dependent variable values. |
Definition at line 157 of file linreg.cpp.
References _log_, ERROR, and gpmp::core::Logger::log().
Referenced by main(), and test_train().
long double gpmp::ml::LinearRegression::mse | ( | const std::vector< long double > & | x_data, |
const std::vector< long double > & | y_data | ||
) | const |
Calculates the Mean Squared Error (MSE) for a dataset.
The Mean Squared Error is a measure of the average squared differences between the actual dependent variable values and the predicted values based on the linear regression model.
x_data | The vector of independent variable values. |
y_data | The vector of actual dependent variable values. |
Definition at line 412 of file linreg.cpp.
Referenced by test_train().
int64_t gpmp::ml::LinearRegression::num_rows | ( | const char * | input | ) |
Calculate the number of rows in a file.
input | Path to the input file. |
Definition at line 474 of file linreg.cpp.
long double gpmp::ml::LinearRegression::predict | ( | long double | _x | ) | const |
Predict a value based on the input.
x | Input value. |
Definition at line 356 of file linreg.cpp.
Referenced by main(), and test_train().
long double gpmp::ml::LinearRegression::predict | ( | long double | _x, |
const std::vector< long double > & | x_data | ||
) |
Predict a value based on the input.
x | Input value. |
x_data | X value data. |
Definition at line 362 of file linreg.cpp.
long double gpmp::ml::LinearRegression::r_sqrd | ( | const std::vector< long double > & | x_data, |
const std::vector< long double > & | y_data | ||
) | const |
Calculate the coefficient of determination (R-squared).
The coefficient of determination, often referred to as R-squared, is a statistical measure that represents the proportion of the variance in the dependent variable (Y) that is predictable from the independent variable (X). It quantifies the goodness of fit of the linear regression model to the data.
This function calculates the R-squared value for a linear regression model using the provided dataset of independent variable values (x_data) and dependent variable values (y_data).
x_data | A vector of independent variable values. |
y_data | A vector of corresponding dependent variable values. |
Definition at line 438 of file linreg.cpp.
References _log_, ERROR, and gpmp::core::Logger::log().
Referenced by test_train().
long double gpmp::ml::LinearRegression::return_coeffecient | ( | ) |
Get the coefficient/slope of the best fitting line.
Definition at line 94 of file linreg.cpp.
long double gpmp::ml::LinearRegression::return_constant | ( | ) |
Get the constant term of the best fitting line.
Definition at line 102 of file linreg.cpp.
void gpmp::ml::LinearRegression::show_data | ( | ) |
void gpmp::ml::LinearRegression::split_data | ( | double | test_size, |
unsigned int | seed, | ||
bool | shuffle | ||
) |
Splits the data into training and testing sets.
This function splits the dataset into training and testing sets based on the specified test size and random seed.
test_size | The proportion of data to be used for testing (between 0 and 1). |
seed | The random seed for shuffling the data. |
Definition at line 279 of file linreg.cpp.
References _log_, ERROR, and gpmp::core::Logger::log().
Referenced by test_train().
long double gpmp::ml::LinearRegression::coeff |
Store the coefficient/slope in the best fitting line
Definition at line 60 of file linreg.hpp.
Referenced by LinearRegression().
long double gpmp::ml::LinearRegression::constant |
Store the constant term in the best fitting line
Definition at line 62 of file linreg.hpp.
Referenced by LinearRegression().
long double gpmp::ml::LinearRegression::sum_x |
Contains sum of all (i-th x)
Definition at line 66 of file linreg.hpp.
Referenced by LinearRegression().
long double gpmp::ml::LinearRegression::sum_x_square |
Contains sum of square of all (i-th x)
Definition at line 70 of file linreg.hpp.
Referenced by LinearRegression().
long double gpmp::ml::LinearRegression::sum_xy |
Contains sum of product of all (i-th x) and (i-th y)
Definition at line 64 of file linreg.hpp.
Referenced by LinearRegression().
long double gpmp::ml::LinearRegression::sum_y |
Contains sum of all (i-th y)
Definition at line 68 of file linreg.hpp.
Referenced by LinearRegression().
long double gpmp::ml::LinearRegression::sum_y_square |
Contains sum of square of all (i-th y)
Definition at line 72 of file linreg.hpp.
Referenced by LinearRegression().
std::vector<long double> gpmp::ml::LinearRegression::x |
Dynamic array which is going to contain all (i-th x)
Definition at line 56 of file linreg.hpp.
Referenced by test_train().
std::vector<long double> gpmp::ml::LinearRegression::x_test |
Vector holding x testing data
Definition at line 78 of file linreg.hpp.
Referenced by test_train().
std::vector<long double> gpmp::ml::LinearRegression::x_train |
Vector holding x training data
Definition at line 74 of file linreg.hpp.
std::vector<long double> gpmp::ml::LinearRegression::y |
Dynamic array which is going to contain all (i-th y)
Definition at line 58 of file linreg.hpp.
Referenced by test_train().
std::vector<long double> gpmp::ml::LinearRegression::y_test |
Vector holding y testing data
Definition at line 80 of file linreg.hpp.
Referenced by test_train().
std::vector<long double> gpmp::ml::LinearRegression::y_train |
Vector holding y training data
Definition at line 76 of file linreg.hpp.