| Title: | Bioinformatics Modeling with Recursion and Autoencoder-Based Ensemble |
|---|---|
| Description: | Provides tools for bioinformatics modeling using recursive transformer-inspired architectures, autoencoders, random forests, XGBoost, and stacked ensemble models. Includes utilities for cross-validation, calibration, benchmarking, and threshold optimization in predictive modeling workflows. |
| Authors: | MD. Arshad [aut, cre] |
| Maintainer: | MD. Arshad <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-14 08:59:44 UTC |
| Source: | https://github.com/sulkysubject37/biomor |
Evaluates a trained caret model on test data, returning Accuracy, F1 score, and ROC-AUC. If only one class is present in the test set, ROC-AUC is returned as NA.
biomor_benchmark(model, test_data, outcome_col)biomor_benchmark(model, test_data, outcome_col)
model |
A trained caret model |
test_data |
Dataframe containing predictors and outcome |
outcome_col |
Name of outcome column |
A named list of metrics
Run full BioMoR pipeline
biomor_run_pipeline(data, feature_cols = NULL, epochs = 50)biomor_run_pipeline(data, feature_cols = NULL, epochs = 50)
data |
dataframe with Label + descriptors |
feature_cols |
optional feature set |
epochs |
autoencoder epochs |
list of trained models + benchmark reports
The Brier score is the mean squared error between predicted probabilities and the true binary outcome (0/1). Lower is better.
brier_score(y_true, y_prob, positive = "Active")brier_score(y_true, y_prob, positive = "Active")
y_true |
True factor labels. |
y_prob |
Predicted probabilities for the positive class. |
positive |
Name of the positive class (default |
Numeric Brier score.
Calibrate model probabilities
calibrate_model(model, test_data, method = "platt")calibrate_model(model, test_data, method = "platt")
model |
caret or xgboost model |
test_data |
test dataframe |
method |
"platt" or "isotonic" |
calibrated probs
Sweeps thresholds between 0 and 1 to find the one that maximizes F1.
compute_f1_threshold(y_true, y_prob, positive = "Active")compute_f1_threshold(y_true, y_prob, positive = "Active")
y_true |
True factor labels. |
y_prob |
Predicted probabilities for the positive class. |
positive |
Name of the positive class (default |
A list with elements:
Best probability cutoff.
Maximum F1 score achieved.
Creates a caret::trainControl object for cross-validation, configured for two-class problems, ROC-based performance, and optional sampling strategies such as SMOTE or ROSE.
get_cv_control(cv = 5, sampling = NULL)get_cv_control(cv = 5, sampling = NULL)
cv |
Number of folds (default 5). |
sampling |
Sampling method (e.g., "smote", "rose", or NULL). |
A caret::trainControl object.
Placeholder for extracting embeddings from a trained autoencoder.
get_embeddings(ae_obj, data, feature_cols = NULL)get_embeddings(ae_obj, data, feature_cols = NULL)
ae_obj |
Autoencoder object |
data |
Input data |
feature_cols |
Columns to use as features |
Matrix of embeddings (currently NULL since this is a stub)
Prepare dataset for modeling
prepare_model_data(df, outcome_col = "Label")prepare_model_data(df, outcome_col = "Label")
df |
A data.frame |
outcome_col |
Name of the outcome column |
A processed data.frame with factor outcome
Placeholder for future autoencoder integration in BioMoR.
train_autoencoder( data, feature_cols = NULL, epochs = 10, batch_size = 32, lr = 0.001 )train_autoencoder( data, feature_cols = NULL, epochs = 10, batch_size = 32, lr = 0.001 )
data |
Input data (matrix or data frame) |
feature_cols |
Columns to use as features |
epochs |
Number of training epochs |
batch_size |
Mini-batch size |
lr |
Learning rate |
A placeholder list with class "autoencoder"
Train BioMoR Autoencoder
train_biomor(data, feature_cols, epochs = 100, batch_size = 50, lr = 0.001)train_biomor(data, feature_cols, epochs = 100, batch_size = 50, lr = 0.001)
data |
Dataframe with numeric features + Label |
feature_cols |
Character vector of feature columns |
epochs |
Number of training epochs |
batch_size |
Batch size |
lr |
Learning rate |
list(model, dataset, embeddings)
Train a Random Forest model with caret
train_rf(df, outcome_col = "Label", ctrl)train_rf(df, outcome_col = "Label", ctrl)
df |
A data.frame containing predictors and outcome |
outcome_col |
Name of the outcome column (binary factor) |
ctrl |
A caret::trainControl object |
A caret train object
Train an XGBoost model with caret
train_xgb_caret(df, outcome_col = "Label", ctrl)train_xgb_caret(df, outcome_col = "Label", ctrl)
df |
A data.frame containing predictors and outcome |
outcome_col |
Name of the outcome column (binary factor) |
ctrl |
A caret::trainControl object |
A caret train object