Title: | Sentiment Analysis for Text, Image and Video using Transformer Models |
---|---|
Description: | Implements sentiment analysis using huggingface <https://huggingface.co> transformer zero-shot classification model pipelines for text and image data. The default text pipeline is Cross-Encoder's DistilRoBERTa <https://huggingface.co/cross-encoder/nli-distilroberta-base> and default image/video pipeline is Open AI's CLIP <https://huggingface.co/openai/clip-vit-base-patch32>. All other zero-shot classification model pipelines can be implemented using their model name from <https://huggingface.co/models?pipeline_tag=zero-shot-classification>. |
Authors: | Alexander Christensen [aut] |
Maintainer: | Aleksandar Tomašević <[email protected]> |
License: | GPL (>= 3.0) |
Version: | 0.1.6 |
Built: | 2025-03-08 05:32:49 UTC |
Source: | https://github.com/atomashevic/transforemotion |
Implements sentiment and emotion analysis using huggingface transformer zero-shot classification model pipelines on text and image data. The default text pipeline is Cross-Encoder's DistilRoBERTa and default image/video pipeline is Open AI's CLIP. All other zero-shot classification model pipelines can be implemented using their model name from https://huggingface.co/models?pipeline_tag=zero-shot-classification.
Alexander P. Christensen <[email protected]>, Hudson Golino <[email protected]> and Aleksandar Tomasevic <[email protected]>
Yin, W., Hay, J., & Roth, D. (2019). Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. arXiv preprint arXiv:1909.00161.
This function calculates the moving average for a time series.
calculate_moving_average(data, window_size)
calculate_moving_average(data, window_size)
data |
Matrix or Data frame. The time series data |
window_size |
Numeric integer. The size of the moving average window. |
Matrix or Data frame containing the moving average values.
Installs required Python modules for the {transforEmotion} package, with automatic GPU detection and optional GPU-enabled module installation.
check_nvidia_gpu()
check_nvidia_gpu()
This function performs the following steps:
Checks for NVIDIA GPU availability
If GPU is detected, prompts user to choose between CPU or GPU installation
Installs core modules including transformers, torch, tensorflow, and other dependencies
For GPU installations, sets up additional GPU-specific modules via setup_gpu_modules()
The function automatically manages dependencies and versions, ensuring compatibility
between CPU and GPU variants of packages like torch, tensorflow, and torchvision.
It uses conda_install
for package management in the
'transforEmotion' conda environment.
Ensure that miniconda is installed and properly configured before running this function. For GPU support, NVIDIA drivers must be properly installed on your system.
Alexander P. Christensen <[email protected]>
This function checks if the "transforEmotion" conda environment exists by running the command "conda env list" and searching for the environment name in the output.
conda_check()
conda_check()
A logical value indicating whether the "transforEmotion" conda environment exists.
Large language models can be quite large and, when stored locally, can take up a lot of space on your computer. The direct paths to where the models are on your computer is not necessarily intuitive.
This function quickly identifies the models on your computer and informs you which ones can be deleted from it to open up storage space
delete_transformer(model_name, delete = FALSE)
delete_transformer(model_name, delete = FALSE)
model_name |
Character vector. If no model is provided, then a list of models that are locally stored on the computer are printed |
delete |
Boolean (length = 1).
Should model skip delete question?
Defaults to |
Returns list of models or confirmed deletion
Alexander P. Christensen <[email protected]>
if(interactive()){ delete_transformer() }
if(interactive()){ delete_transformer() }
This function calculates the dynamics of a system using the DLO (Damped Linear Oscillator) model based on Equation 1 (Ollero et al., 2023). The DLO model is a second-order differential equation that describes the behavior of a damped harmonic oscillator. The function takes in the current state of the system, the derivative of the state, the damping coefficient, the time step, and the values of the eta and zeta parameters. It returns the updated derivative of the state.
dlo_dynamics(x, dxdt, q, dt, eta, zeta)
dlo_dynamics(x, dxdt, q, dt, eta, zeta)
x |
Numeric. The current state of the system (value of the latent score). |
dxdt |
Numeric. The derivative of the state (rate of change of the latent score). |
q |
Numeric. The damping coefficient. |
dt |
Numeric. The time step. |
eta |
Numeric. The eta parameter of the DLO model. |
zeta |
Numeric. The zeta parameter of the DLO model. |
A numeric vector containing the updated derivative of the state.
Ollero, M. J. F., Estrada, E., Hunter, M. D., & Cancer, P. F. (2023). Characterizing affect dynamics with a damped linear oscillator model: Theoretical considerations and recommendations for individual-level applications. Psychological Methods. doi:10.1037/met0000615
A matrix containing words (n = 175,592) and the emotion category most frequently associated with each word. This dataset is a modified version of the 'DepecheMood++' lexicon developed by Araque, Gatti, Staiano, and Guerini (2018). For proper scoring, text should not be stemmed prior to using this lexicon. This version of the lexicon does not rely on part of speech tagging.
data(emotions)
data(emotions)
A data frame with 175,592 rows and 9 columns.
An entry in the lexicon, in English
The emotional category. All emotions contain either a 0 or 1. If the category is most likely to be associated with the word, it recieves a 1, otherwise, 0. Words are only associated with one category.
Araque, O., Gatti, L., Staiano, J., and Guerini, M. (2018). DepecheMood++: A bilingual emotion lexicon built through simple yet powerful techniques. ArXiv
data("emotions")
data("emotions")
A bag-of-words approach for computing emotions in text data using the lexicon compiled by Araque, Gatti, Staiano, and Guerini (2018).
emoxicon_scores(text, lexicon, exclude)
emoxicon_scores(text, lexicon, exclude)
text |
Matrix or data frame. A data frame containing texts to be scored (one text per row) |
lexicon |
The lexicon used to score the words. The default is the |
exclude |
A vector listing terms that should be excluded from the lexicon.
Words specified in |
Tara Valladares <tls8vx at virginia.edu> and Hudson F. Golino <hfg9s at virginia.edu>
Araque, O., Gatti, L., Staiano, J., and Guerini, M. (2018). DepecheMood++: A bilingual emotion lexicon built through simple yet powerful techniques. ArXiv
emotions
, where we describe how we modified the original DepecheMood++ lexicon.
# Obtain "emotions" data data("emotions") # Obtain "tinytrolls" data data("tinytrolls") ## Not run: # Obtain emoxicon scores for first 10 tweets emotions_tinytrolls <- emoxicon_scores(text = tinytrolls$content, lexicon = emotions) ## End(Not run)
# Obtain "emotions" data data("emotions") # Obtain "tinytrolls" data data("tinytrolls") ## Not run: # Obtain emoxicon scores for first 10 tweets emotions_tinytrolls <- emoxicon_scores(text = tinytrolls$content, lexicon = emotions) ## End(Not run)
This function generates and emphasizes the effect of strong emotions expressions during the period where the derivative of the latent variable is high. The observable value of the strongest emotion from the positive or negative group will spike in the next k time steps. The probability of this happening is p at each time step in which the derivative of the latent variable is greater than 0.2. The jump is proportionate to the derivative of the latent variable and the sum of the observable values of the other emotions.
emphasize(data, num_observables, num_steps, k = 10, p = 0.5)
emphasize(data, num_observables, num_steps, k = 10, p = 0.5)
data |
Data frame.
The data frame containing the latent and observable variables created by the |
num_observables |
Numeric integer. The number of observable variables per latent factor. |
num_steps |
Numeric integer. The number of time steps used in the simulation. |
k |
Numeric integer. The mumber of time steps to emphasize the effect of strong emotions on future emotions (default is 10). Alternatively: the length of a strong emotional episode. |
p |
Numeric. The probability of the strongest emotion being emphasized in the next k time steps (default is 0.5). |
A data frame containing the updated observable variables.
Function to generate observable data from 2 latent variables (negative and positive affect). The function takes in the latent variable scores, the number of time steps, the number of observable variables per latent factor, and the measurement error variance. It returns a matrix of observable data. The factor loadings are not the same for all observable variables. They have uniform random noise added to them (between -0.15 and 0.15). The loadings are scaled so that the sum of the loadings for each latent factor is 2, to introduce a ceiling effect and to differentiate the dynamics of specific emotions. This is further empahsized by adding small noise to the measurement error variance for each observed variable (between -0.01 and 0.01).
generate_observables(X, num_steps, num_obs, error, loadings = 0.8)
generate_observables(X, num_steps, num_obs, error, loadings = 0.8)
X |
Matrix or Data frame. The (num_steps X 2) matrix of latent variable scores. |
num_steps |
Numeric integer. Number of time steps. |
num_obs |
Numeric integer. The number of observable variables per latent factor. |
error |
Numeric. Measurement error variance. |
loadings |
Numeric (default = 0.8). The default initial loading of the latent variable on the observable variable. |
A (num_steps X num_obs) Matrix or Data frame containing the observable variables.
This function generates a matrix of Dynamic Error values (q) for the DLO simulation.
generate_q(num_steps, sigma_q)
generate_q(num_steps, sigma_q)
num_steps |
Numeric integer. The number of time steps used in the simulation. |
sigma_q |
Numeric. Standard deviation of the Dynamic Error/ |
A (num_steps X 3) matrix of Dynamic Error values for neutral, negative and positive emotion latent score.
This function takes an image file and a vector of classes as input and calculates the scores for each class using a specified Hugging Face CLIP model. Primary use of the function is to calculate FER scores - Facial Expression Detection of emotions based on detected facial expression in images. In case there are more than one face in the image, the function will return the scores of the face selected using the face_selection parameter. If there is no face in the image, the function will return NA for all classes. Function uses reticulate to call the Python functions in the image.py file. If you run this package/function for the first time it will take some time for the package to setup a functioning Python virtual environment in the background. This includes installing Python libraries for facial recognition and emotion detection in text, images and video. Please be patient.
image_scores(image, classes, face_selection = "largest", model = "oai-base")
image_scores(image, classes, face_selection = "largest", model = "oai-base")
image |
The path to the image file or URL of the image. |
classes |
A character vector of classes to classify the image into. |
face_selection |
The method to select the face in the image. Can be "largest" or "left" or "right". Default is "largest" and will select the largest face in the image. "left" and "right" will select the face on the far left or the far right side of the image. Face_selection method is irrelevant if there is only one face in the image. |
model |
A string specifying the CLIP model to use. Options are:
|
Data Privacy: All processing is done locally with the downloaded model, and your images are never sent to any remote server or third-party.
A data frame containing the scores for each class.
Aleksandar Tomasevic <[email protected]>
This function generates a random sample from the multivariate normal distribution with mean mu and covariance matrix Sigma.
MASS_mvrnorm(n = 1, mu, Sigma, tol = 1e-06, empirical = FALSE, EISPACK = FALSE)
MASS_mvrnorm(n = 1, mu, Sigma, tol = 1e-06, empirical = FALSE, EISPACK = FALSE)
n |
Numeric integer. The number of observations to generate. |
mu |
Numeric vector. The mean vector of the multivariate normal distribution. |
Sigma |
Numeric matrix. The covariance matrix of the multivariate normal distribution. |
tol |
Numeric. Tolerance for checking the positive definiteness of the covariance matrix. |
empirical |
Logical. Whether to return the empirical covariance matrix. |
EISPACK |
Logical. Whether to use the EISPACK routine instead of the LINPACK routine. |
A (n X p) matrix of random observations from the multivariate normal distribution. Updated: 26.10.2023.
A list (length = 6) of the NEO-PI-R IPIP item descriptions (https://ipip.ori.org/newNEOFacetsKey.htm). Each vector within the 6 list elements contains the item descriptions for the respective Extraversion facets – friendliness, gregariousness, assertiveness, activity_level, excitement_seeking, and cheerfulness
data(neo_ipip_extraversion)
data(neo_ipip_extraversion)
A list (length = 6)
data("neo_ipip_extraversion")
data("neo_ipip_extraversion")
Natural Language Processing using word embeddings to compute
semantic similarities (cosine; see
costring
) of text and specified classes
nlp_scores( text, classes, semantic_space = c("baroni", "cbow", "cbow_ukwac", "en100", "glove", "tasa"), preprocess = TRUE, remove_stop = TRUE, keep_in_env = TRUE, envir = 1 )
nlp_scores( text, classes, semantic_space = c("baroni", "cbow", "cbow_ukwac", "en100", "glove", "tasa"), preprocess = TRUE, remove_stop = TRUE, keep_in_env = TRUE, envir = 1 )
text |
Character vector or list. Text in a vector or list data format |
classes |
Character vector. Classes to score the text |
semantic_space |
Character vector. The semantic space used to compute the distances between words (more than one allowed). Here's a list of the semantic spaces:
|
preprocess |
Boolean.
Should basic preprocessing be applied?
Includes making lowercase, keeping only alphanumeric characters,
removing escape characters, removing repeated characters,
and removing white space.
Defaults to |
remove_stop |
Boolean.
Should |
keep_in_env |
Boolean.
Whether the classifier should be kept in your global environment.
Defaults to |
envir |
Numeric. Environment for the classifier to be saved for repeated use. Defaults to the global environment |
Returns semantic distances for the text classes
Alexander P. Christensen <[email protected]>
Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don't count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd annual meting of the association for computational linguistics (pp. 238-247).
Landauer, T.K., & Dumais, S.T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211-240.
Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1532-1543).
# Load data data(neo_ipip_extraversion) # Example text text <- neo_ipip_extraversion$friendliness[1:5] ## Not run: # GloVe nlp_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ) ) # Baroni nlp_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), semantic_space = "baroni" ) # CBOW nlp_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), semantic_space = "cbow" ) # CBOW + ukWaC nlp_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), semantic_space = "cbow_ukwac" ) # en100 nlp_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), semantic_space = "en100" ) # tasa nlp_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), semantic_space = "tasa" ) ## End(Not run)
# Load data data(neo_ipip_extraversion) # Example text text <- neo_ipip_extraversion$friendliness[1:5] ## Not run: # GloVe nlp_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ) ) # Baroni nlp_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), semantic_space = "baroni" ) # CBOW nlp_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), semantic_space = "cbow" ) # CBOW + ukWaC nlp_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), semantic_space = "cbow_ukwac" ) # en100 nlp_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), semantic_space = "en100" ) # tasa nlp_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), semantic_space = "tasa" ) ## End(Not run)
Function to plot the latent or the observable emotion scores.
plot_sim_emotions(df, mode = "latent", title = " ")
plot_sim_emotions(df, mode = "latent", title = " ")
df |
Data frame.
The data frame containing the latent and observable variables created by the |
mode |
Character. The mode of the plot. Can be either 'latent', 'positive' or 'negative'. |
title |
Character. The title of the plot. Default is an empty title, ' '. |
A plot of the latent or the observable emotion scores.
Keeps the punctuations you want and removes the punctuations you don't
punctuate( text, allowPunctuations = c("-", "?", "'", "\"", ";", ",", ".", "!") )
punctuate( text, allowPunctuations = c("-", "?", "'", "\"", ";", ",", ".", "!") )
text |
Character vector or list. Text in a vector or list data format |
allowPunctuations |
Character vector. Punctuations that should be allowed in the text. Defaults to common punctuations in English text |
Coarsely removes punctuations from text. Keeps general punctuations that are used in most English language text. Apostrophes are much trickier. For example, not allowing "'" will remove apostrophes from contractions like "can't" becoming "cant"
Returns text with only the allowed punctuations
Alexander P. Christensen <[email protected]>
# Load data data(neo_ipip_extraversion) # Example text text <- neo_ipip_extraversion$friendliness # Keep only periods punctuate(text, allowPunctuations = c("."))
# Load data data(neo_ipip_extraversion) # Example text text <- neo_ipip_extraversion$friendliness # Keep only periods punctuate(text, allowPunctuations = c("."))
Performs retrieval-augmented generation {llama-index}
Currently limited to the TinyLLAMA model
rag( text = NULL, path = NULL, transformer = c("LLAMA-2", "Mistral-7B", "OpenChat-3.5", "Orca-2", "Phi-2", "TinyLLAMA"), prompt = "You are an expert at extracting themes across many texts", query, response_mode = c("accumulate", "compact", "no_text", "refine", "simple_summarize", "tree_summarize"), similarity_top_k = 5, device = c("auto", "cpu", "cuda"), keep_in_env = TRUE, envir = 1, progress = TRUE )
rag( text = NULL, path = NULL, transformer = c("LLAMA-2", "Mistral-7B", "OpenChat-3.5", "Orca-2", "Phi-2", "TinyLLAMA"), prompt = "You are an expert at extracting themes across many texts", query, response_mode = c("accumulate", "compact", "no_text", "refine", "simple_summarize", "tree_summarize"), similarity_top_k = 5, device = c("auto", "cpu", "cuda"), keep_in_env = TRUE, envir = 1, progress = TRUE )
text |
Character vector or list.
Text in a vector or list data format.
|
path |
Character.
Path to .pdfs stored locally on your computer.
Defaults to |
transformer |
Character. Large language model to use for RAG. Available models include:
|
prompt |
Character (length = 1).
Prompt to feed into TinyLLAMA.
Defaults to |
query |
Character.
The query you'd like to know from the documents.
Defaults to |
response_mode |
Character (length = 1). Different responses generated from the model. See documentation here Defaults to |
similarity_top_k |
Numeric (length = 1).
Retrieves most representative texts given the Values will vary based on number of texts but some suggested values might be:
These values depend on the number and quality of texts. Adjust as necessary |
device |
Character.
Whether to use CPU or GPU for inference.
Defaults to |
keep_in_env |
Boolean (length = 1).
Whether the classifier should be kept in your global environment.
Defaults to |
envir |
Numeric (length = 1). Environment for the classifier to be saved for repeated use. Defaults to the global environment |
progress |
Boolean (length = 1).
Whether progress should be displayed.
Defaults to |
Returns response from TinyLLAMA
All processing is done locally with the downloaded model, and your text is never sent to any remote server or third-party.
Alexander P. Christensen <[email protected]>
# Load data data(neo_ipip_extraversion) # Example text text <- neo_ipip_extraversion$friendliness[1:5] ## Not run: rag( text = text, query = "What themes are prevalent across the text?", response_mode = "tree_summarize", similarity_top_k = 5 ) ## End(Not run)
# Load data data(neo_ipip_extraversion) # Example text text <- neo_ipip_extraversion$friendliness[1:5] ## Not run: rag( text = text, query = "What themes are prevalent across the text?", response_mode = "tree_summarize", similarity_top_k = 5 ) ## End(Not run)
Uses sentiment analysis pipelines from huggingface to compute probabilities that the text corresponds to the specified classes
sentence_similarity( text, comparison_text, transformer = c("all_minilm_l6"), device = c("auto", "cpu", "cuda"), preprocess = FALSE, keep_in_env = TRUE, envir = 1 )
sentence_similarity( text, comparison_text, transformer = c("all_minilm_l6"), device = c("auto", "cpu", "cuda"), preprocess = FALSE, keep_in_env = TRUE, envir = 1 )
text |
Character vector or list. Text in a vector or list data format |
comparison_text |
Character vector or list. Text in a vector or list data format |
transformer |
Character.
Specific sentence similarity transformer
to be used.
Defaults to Also allows any sentence similarity models with a pipeline
from huggingface
to be used by using the specified name (e.g., |
device |
Character.
Whether to use CPU or GPU for inference.
Defaults to |
preprocess |
Boolean.
Should basic preprocessing be applied?
Includes making lowercase, keeping only alphanumeric characters,
removing escape characters, removing repeated characters,
and removing white space.
Defaults to |
keep_in_env |
Boolean.
Whether the classifier should be kept in your global environment.
Defaults to |
envir |
Numeric. Environment for the classifier to be saved for repeated use. Defaults to the global environment |
Returns a n x m similarity matrix where n is length of text
and m is the length of comparison_text
Alexander P. Christensen <[email protected]>
# Load data data(neo_ipip_extraversion) # Example text text <- neo_ipip_extraversion$friendliness[1:5] ## Not run: # Example with defaults sentence_similarity( text = text, comparison_text = text ) # Example with model from 'sentence-transformers' sentence_similarity( text = text, comparison_text = text, transformer = "sentence-transformers/all-mpnet-base-v2" ) ## End(Not run)
# Load data data(neo_ipip_extraversion) # Example text text <- neo_ipip_extraversion$friendliness[1:5] ## Not run: # Example with defaults sentence_similarity( text = text, comparison_text = text ) # Example with model from 'sentence-transformers' sentence_similarity( text = text, comparison_text = text, transformer = "sentence-transformers/all-mpnet-base-v2" ) ## End(Not run)
Installs GPU-specific Python modules for the {transforEmotion} conda environment.
setup_gpu_modules()
setup_gpu_modules()
This function installs additional GPU-specific modules including:
AutoAWQ for weight quantization
Auto-GPTQ for GPU quantization
Optimum for transformer optimization
llama-cpp-python (Linux only) for CPU/GPU inference
The function is typically called by setup_modules()
when GPU installation
is selected, but can also be run independently to update GPU-specific modules.
This function requires NVIDIA GPU and drivers to be properly installed.
Alexander P. Christensen <[email protected]>
Installs miniconda and activates the transforEmotion environment
setup_miniconda()
setup_miniconda()
Installs miniconda using install_miniconda
and activates the transforEmotion environment using use_condaenv
. If the transforEmotion environment does not exist, it will be created using conda_create
.
Alexander P. Christensen <[email protected]> Aleksandar Tomasevic <[email protected]>
This function simulates emotions in a video using the DLO model implemented as continuous time state space model. The function takes in several parameters, including the time step, number of steps, number of observables, and various model parameters. It returns a data frame containing the simulated emotions and their derivatives, as well as smoothed versions of the observables. The initial state of the video is always the same. Neutral score is 0.5 and both positive and negative emotion score is 0.25. To simulate more realistic time series, there is an option of including a sudden jump in the emotion scores. This is done by emphasizing the effect of the dominant emotion during the period where the derivative of the latent variable is high. The observable value of the strongest emotion from the positive or negative group will spike in the next k time step (emph.dur). The probability of this happening is p at each time step in which the derivative of the latent variable is greater than 0.2. The jump is proportionate to the derivative of the latent variable and the sum of the observable values of the other emotions.
simulate_video( dt, num_steps, num_observables, eta_n, zeta_n, eta, zeta, sigma_q, sd_observable, loadings, window_size, emph = FALSE, emph.dur = 10, emph.prob = 0.5 )
simulate_video( dt, num_steps, num_observables, eta_n, zeta_n, eta, zeta, sigma_q, sd_observable, loadings, window_size, emph = FALSE, emph.dur = 10, emph.prob = 0.5 )
dt |
Numeric real. The time step for the simulation (in minutes). |
num_steps |
Numeric real. Total length of the video (in minutes). |
num_observables |
Numeric integer. The number of observables to generate per factor. Total number of observables generated is 2 x num_observables. |
eta_n |
Numeric. The eta parameter for the neutral state. |
zeta_n |
Numeric. The zeta parameter for the neutral state. |
eta |
Numeric. The eta parameter for the positive and negative emotions. |
zeta |
Numeric. The zeta parameter for the positive and negative emotions. |
sigma_q |
Numeric. The standard deviation of Dynamic Error of the q(t) function. |
sd_observable |
Numeric. The standard deviation of the measurement error. |
loadings |
Numeric (default = 0.8). The default initial loading of the latent variable on the observable variable. |
window_size |
Numeric integer. The window size for smoothing the observables. |
emph |
Logical. Whether to emphasize the effect of dominant emotion (default is FALSE). |
emph.dur |
Numeric integer. The duration of the emphasis (default is 10). |
emph.prob |
Numeric. The probability of the dominant emotion being emphasized (default is 0.5). |
A data frame (num_steps X (6 + num_observables)) containing the latent scores for neutral score, positive emotions, negative emotions and their derivatives, as well as smoothed versions of the observables.
simulate_video(dt = 0.01, num_steps = 50, num_observables = 4, eta_n = 0.5, zeta_n = 0.5, eta = 0.5, zeta = 0.5, sigma_q = 0.1, sd_observable = 0.1, loadings = 0.8, window_size = 10)
simulate_video(dt = 0.01, num_steps = 50, num_observables = 4, eta_n = 0.5, zeta_n = 0.5, eta = 0.5, zeta = 0.5, sigma_q = 0.1, sd_observable = 0.1, loadings = 0.8, window_size = 10)
174 English stop words in the tm package
data(stop_words)
data(stop_words)
A vector (length = 174)
data("stop_words")
data("stop_words")
A matrix containing a smaller subset of tweets from the trolls
dataset, useful for test purposes.
There are approximately 20,000 tweets from 50 authors.
This dataset includes only authored tweets by each account; retweets, reposts, and repeated tweets have been removed.
The original data was provided by FiveThirtyEight and Clemson University researchers Darren Linvill and Patrick Warren.
For more information, visit https://github.com/fivethirtyeight/russian-troll-tweets
data(tinytrolls)
data(tinytrolls)
A data frame with 22,143 rows and 6 columns.
A tweet.
The name of the handle that authored the tweet.
The date the tweet was published on.
How many followers the handle had at the time of posting.
How many interactions (including likes, tweets, retweets) the post garnered.
Left or Right
data(tinytrolls)
data(tinytrolls)
Uses sentiment analysis pipelines from huggingface to compute probabilities that the text corresponds to the specified classes
transformer_scores( text, classes, multiple_classes = FALSE, transformer = c("cross-encoder-roberta", "cross-encoder-distilroberta", "facebook-bart"), device = c("auto", "cpu", "cuda"), preprocess = FALSE, keep_in_env = TRUE, envir = 1 )
transformer_scores( text, classes, multiple_classes = FALSE, transformer = c("cross-encoder-roberta", "cross-encoder-distilroberta", "facebook-bart"), device = c("auto", "cpu", "cuda"), preprocess = FALSE, keep_in_env = TRUE, envir = 1 )
text |
Character vector or list. Text in a vector or list data format |
classes |
Character vector. Classes to score the text |
multiple_classes |
Boolean.
Whether the text can belong to multiple true classes.
Defaults to |
transformer |
Character. Specific zero-shot sentiment analysis transformer to be used. Default options:
Defaults to Also allows any zero-shot classification models with a pipeline
from huggingface
to be used by using the specified name (e.g., |
device |
Character.
Whether to use CPU or GPU for inference.
Defaults to |
preprocess |
Boolean.
Should basic preprocessing be applied?
Includes making lowercase, keeping only alphanumeric characters,
removing escape characters, removing repeated characters,
and removing white space.
Defaults to |
keep_in_env |
Boolean.
Whether the classifier should be kept in your global environment.
Defaults to |
envir |
Numeric. Environment for the classifier to be saved for repeated use. Defaults to the global environment |
Returns probabilities for the text classes
All processing is done locally with the downloaded model, and your text is never sent to any remote server or third-party.
Alexander P. Christensen <[email protected]>
# BART
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2019).
Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.
arXiv preprint arXiv:1910.13461.
# RoBERTa
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019).
Roberta: A robustly optimized bert pretraining approach.
arXiv preprint arXiv:1907.11692.
# Zero-shot classification
Yin, W., Hay, J., & Roth, D. (2019).
Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach.
arXiv preprint arXiv:1909.00161.
# MultiNLI dataset
Williams, A., Nangia, N., & Bowman, S. R. (2017).
A broad-coverage challenge corpus for sentence understanding through inference.
arXiv preprint arXiv:1704.05426.
# Load data data(neo_ipip_extraversion) # Example text text <- neo_ipip_extraversion$friendliness[1:5] ## Not run: # Cross-Encoder DistilRoBERTa transformer_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ) ) # Facebook BART Large transformer_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), transformer = "facebook-bart" ) # Directly from huggingface: typeform/distilbert-base-uncased-mnli transformer_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), transformer = "typeform/distilbert-base-uncased-mnli" ) ## End(Not run)
# Load data data(neo_ipip_extraversion) # Example text text <- neo_ipip_extraversion$friendliness[1:5] ## Not run: # Cross-Encoder DistilRoBERTa transformer_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ) ) # Facebook BART Large transformer_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), transformer = "facebook-bart" ) # Directly from huggingface: typeform/distilbert-base-uncased-mnli transformer_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ), transformer = "typeform/distilbert-base-uncased-mnli" ) ## End(Not run)
This function retrieves facial expression recognition (FER) scores from a specific number of frames extracted from a YouTube video using a specified Hugging Face CLIP model. It utilizes Python libraries for facial recognition and emotion detection in text, images, and video.
video_scores( video, classes, nframes = 100, face_selection = "largest", start = 0, end = -1, uniform = FALSE, ffreq = 15, save_video = FALSE, save_frames = FALSE, save_dir = "temp/", video_name = "temp", model = "oai-base" )
video_scores( video, classes, nframes = 100, face_selection = "largest", start = 0, end = -1, uniform = FALSE, ffreq = 15, save_video = FALSE, save_frames = FALSE, save_dir = "temp/", video_name = "temp", model = "oai-base" )
video |
The URL of the YouTube video to analyze. |
classes |
A character vector specifying the classes to analyze. |
nframes |
The number of frames to analyze in the video. Default is 100. |
face_selection |
The method for selecting faces in the video. Options are "largest", "left", or "right". Default is "largest". |
start |
The start time of the video range to analyze. Default is 0. |
end |
The end time of the video range to analyze. Default is -1 and this means that video won't be cut. If end is a positive number greater than start, the video will be cut from start to end. |
uniform |
Logical indicating whether to uniformly sample frames from the video. Default is FALSE. |
ffreq |
The frame frequency for sampling frames from the video. Default is 15. |
save_video |
Logical indicating whether to save the analyzed video. Default is FALSE. |
save_frames |
Logical indicating whether to save the analyzed frames. Default is FALSE. |
save_dir |
The directory to save the analyzed frames. Default is "temp/". |
video_name |
The name of the analyzed video. Default is "temp". |
model |
A string specifying the CLIP model to use. Options are:
|
A result object containing the analyzed video scores.
All processing is done locally with the downloaded model, and your video frames are never sent to any remote server or third-party.
Aleksandar Tomasevic <[email protected]>