The evaluate_emotions() function provides comprehensive
evaluation capabilities for discrete emotion classification tasks. This
vignette demonstrates how to use the function to assess model
performance using standard metrics and visualizations.
# Install transforEmotion if not already installed
# devtools::install_github("your-repo/transforEmotion")
library(transforEmotion)library(transforEmotion)
#> [1;m[4;m
#> transforEmotion (version 0.1.7)
#> [0m[0m
#> Important: If you are using RStudio, please make sure you have the latest version installed.
#> For help getting started, type browseVignettes("transforEmotion")
#>
#> For bugs and errors, submit an issue to <https://github.com/atomashevic/transforEmotion/issues>
#> Python dependencies are installed automatically via uv on first use. Optionally run setup_modules() to pre-warm the environment.
#> Data Privacy: All processing is done locally with the downloaded model, and your data is never sent to any remote server or third-party.
#>
#> Available vision models: Use list_vision_models() to see all models or register_vision_model() to add custom models.First, let’s create some sample evaluation data to demonstrate the function:
# Create synthetic evaluation data
set.seed(42)
n_samples <- 200
# Generate ground truth labels
emotions <- c("anger", "joy", "sadness", "fear", "surprise")
eval_data <- data.frame(
id = 1:n_samples,
truth = sample(emotions, n_samples, replace = TRUE,
prob = c(0.2, 0.3, 0.2, 0.15, 0.15)),
stringsAsFactors = FALSE
)
# Generate realistic predictions (correlated with truth but with some errors)
eval_data$pred <- eval_data$truth
# Introduce some classification errors
error_indices <- sample(1:n_samples, size = 0.25 * n_samples)
eval_data$pred[error_indices] <- sample(emotions, length(error_indices), replace = TRUE)
# Generate probability scores
for (emotion in emotions) {
# Higher probability for correct class, lower for others
eval_data[[paste0("prob_", emotion)]] <- ifelse(
eval_data$truth == emotion,
runif(n_samples, 0.6, 0.95), # Higher prob for correct class
runif(n_samples, 0.01, 0.4) # Lower prob for incorrect classes
)
}
# Normalize probabilities to sum to 1
prob_cols <- paste0("prob_", emotions)
prob_sums <- rowSums(eval_data[, prob_cols])
eval_data[, prob_cols] <- eval_data[, prob_cols] / prob_sums
# Display sample data
head(eval_data)
#> id truth pred prob_anger prob_joy prob_sadness prob_fear
#> 1 1 fear surprise 0.10212549 0.02829695 0.16916418 0.63349125
#> 2 2 fear surprise 0.22000913 0.08829725 0.04855244 0.44341087
#> 3 3 joy joy 0.05642642 0.40583491 0.17749856 0.18908123
#> 4 4 surprise surprise 0.19460955 0.09289653 0.03606787 0.06384095
#> 5 5 anger anger 0.47644873 0.21565003 0.08878616 0.17295498
#> 6 6 anger anger 0.50608394 0.09512487 0.17226538 0.21580694
#> prob_surprise
#> 1 0.06692213
#> 2 0.19973031
#> 3 0.17115887
#> 4 0.61258509
#> 5 0.04616011
#> 6 0.01071888Now let’s evaluate the model performance with basic metrics:
# Basic evaluation with default metrics
results <- evaluate_emotions(
data = eval_data,
truth_col = "truth",
pred_col = "pred"
)
# Print results
print(results)
#> Emotion Classification Evaluation Results
#> ========================================
#>
#> Summary:
#> Total instances: 200
#> Classes: 5 (anger, fear, joy, sadness, surprise)
#> Overall accuracy: 0.805
#> Macro F1: 0.799
#>
#> Metrics:
#> metric value
#> 1 accuracy 0.8050000
#> 2 precision_macro 0.7986522
#> 3 recall_macro 0.8004904
#> 4 f1_macro 0.7993882
#> 5 f1_micro 0.8050000
#> 6 krippendorff_alpha 0.7531802For more comprehensive evaluation including calibration metrics:
# Full evaluation with probability scores
results_full <- evaluate_emotions(
data = eval_data,
truth_col = "truth",
pred_col = "pred",
probs_cols = prob_cols,
classes = emotions,
return_plot = TRUE
)
# Display summary
summary(results_full)
#> Emotion Classification Evaluation Summary
#> =======================================
#>
#> Dataset Information:
#> - Total instances: 200
#> - Number of classes: 5
#> - Classes: anger, fear, joy, sadness, surprise
#>
#> Overall Performance:
#> - Accuracy: 0.805
#> - Macro F1: 0.799
#> - Micro F1: 0.805
#> - Macro AUROC: 1.000
#> - Expected Calibration Error: 0.313
#> - Krippendorff's alpha: 0.753
#>
#> Per-Class Metrics:
#> class precision recall f1
#> anger 0.8048780 0.8250000 0.8148148
#> joy 0.8518519 0.8214286 0.8363636
#> sadness 0.8125000 0.7878788 0.8000000
#> fear 0.7333333 0.7586207 0.7457627
#> surprise 0.7906977 0.8095238 0.8000000
#>
#> Confusion Matrix:
#> Actual
#> Predicted anger joy sadness fear surprise Sum
#> anger 33 2 0 2 4 41
#> joy 1 46 3 2 2 54
#> sadness 3 1 26 1 1 32
#> fear 2 3 2 22 1 30
#> surprise 1 4 2 2 34 43
#> Sum 40 56 33 29 42 200The function computes several standard classification metrics:
When probability scores are provided:
The function provides built-in plotting capabilities:
Here’s how to integrate evaluate_emotions() into a
complete emotion analysis workflow:
# Step 1: Get emotion predictions using transforEmotion
text_data <- c(
"I am so happy today!",
"This makes me really angry.",
"I feel very sad about this news."
)
# Get transformer-based predictions
predictions <- transformer_scores(
x = text_data,
classes = c("anger", "joy", "sadness"),
return_prob = TRUE
)
# Step 2: Prepare evaluation data (assuming you have ground truth)
ground_truth <- c("joy", "anger", "sadness") # Your ground truth labels
eval_df <- data.frame(
id = 1:length(text_data),
truth = ground_truth,
pred = predictions$predicted_class,
prob_anger = predictions$prob_anger,
prob_joy = predictions$prob_joy,
prob_sadness = predictions$prob_sadness,
stringsAsFactors = FALSE
)
# Step 3: Evaluate performance
evaluation <- evaluate_emotions(
data = eval_df,
probs_cols = c("prob_anger", "prob_joy", "prob_sadness")
)
print(evaluation)Select only specific metrics for faster computation:
The function automatically handles missing values:
# Create data with missing values
eval_data_missing <- eval_data
eval_data_missing$truth[1:5] <- NA
eval_data_missing$pred[6:10] <- NA
# Evaluate with automatic missing value removal
results_clean <- evaluate_emotions(
data = eval_data_missing,
na_rm = TRUE # Default behavior
)
#> Warning: Removed 10 rows with missing values
cat("Original samples:", nrow(eval_data_missing), "\n")
#> Original samples: 200
cat("Samples after cleaning:", results_clean$summary$n_instances, "\n")
#> Samples after cleaning: 190Use custom column names for your data:
# Rename columns in your data
custom_data <- eval_data
names(custom_data)[names(custom_data) == "truth"] <- "ground_truth"
names(custom_data)[names(custom_data) == "pred"] <- "model_prediction"
# Evaluate with custom column names
custom_results <- evaluate_emotions(
data = custom_data,
truth_col = "ground_truth",
pred_col = "model_prediction",
metrics = c("accuracy", "f1_macro")
)
print(custom_results)
#> Emotion Classification Evaluation Results
#> ========================================
#>
#> Summary:
#> Total instances: 200
#> Classes: 5 (anger, fear, joy, sadness, surprise)
#> Overall accuracy: 0.805
#> Macro F1: 0.799
#>
#> Metrics:
#> metric value
#> 1 accuracy 0.8050000
#> 2 f1_macro 0.7993882When possible, include probability scores for more comprehensive evaluation:
Choose metrics based on your use case:
Always check your evaluation data before analysis:
Don’t rely on a single metric - report comprehensive results:
# Get comprehensive evaluation
comprehensive_eval <- evaluate_emotions(
data = eval_data,
probs_cols = prob_cols,
metrics = c("accuracy", "precision", "recall", "f1_macro", "f1_micro",
"auroc", "ece", "krippendorff", "confusion_matrix")
)
# Report key metrics
key_metrics <- comprehensive_eval$metrics[
comprehensive_eval$metrics$metric %in% c("accuracy", "f1_macro", "f1_micro"),
]
print(key_metrics)
#> metric value
#> 1 accuracy 0.8050000
#> 4 f1_macro 0.7993882
#> 5 f1_micro 0.8050000The evaluate_emotions() function provides a
comprehensive toolkit for evaluating emotion classification models. It
integrates seamlessly with the transforEmotion package workflow and
follows best practices from the machine learning evaluation
literature.
Key features: - Standard classification metrics (accuracy, precision, recall, F1) - Probabilistic evaluation (AUROC, calibration) - Inter-rater reliability (Krippendorff’s α) - Built-in visualization capabilities - Flexible input handling and data validation
For more information, see the function documentation with
?evaluate_emotions.