setup_miniconda()
transforEmotion uses the reticulate package to automatically install a standalone miniconda version on your computer and download the necessary Python modules to run the transformer models.
After installing transforEmotion package, you need to run
setup_miniconda()
function. This function will performs the
following tasks:
By having a standalone miniconda installed through transforEmotion, you should not encounter any conflicts between miniconda and existing Python installations. It also ensures that the correct version of Python libraries are installed into the virtual environment, guaranteeing compatibility regardless of your operating system or whether you already have Python installed.
The miniconda installation process takes a few minutes to complete.
At certain points, it may appear that the installer is stuck, but please
allow a few minutes for the installer to finish its process. Remember,
setup_miniconda()
only needs to be run once after
installing the package. Unless you run into issues with Python
environment and libraries later on, there is no need to run this
function again on the same system.
transformer_scores
You can use any number of text classification transformers. transforEmotion currently implements only the zero-shot classification models. Future updates to the package may include opportunities to train and fine-tune these models. For now, there are several options that work well for most classification tasks straight out-of-the-box. You can view different transformers that can be used in transforEmotion here.
As mentioned in section 1, before running
the transformer_scores
function, you need to execute
setup_miniconda()
to install the necessary modules. Running
the transformer_scores
function will also trigger the
download of the Cross-Encoder’s
DistilRoBERTa transformer model. Therefore, the easiest way to get
started is by using an example.
library(transforEmotion)
# Setup Python
setup_miniconda()
# Load data
data(neo_ipip_extraversion)
# Example text
text <- neo_ipip_extraversion$friendliness[1:5] # positively worded items only
# Run transformer function
transformer_scores(
text = text,
classes = c(
"friendly", "gregarious", "assertive",
"active", "excitement", "cheerful"
)
)
The downloads will take some time when you run a specific model for the first time. Assuming everything goes well with the code above, you should see output that looks like this:
$`make friends easily`
friendly gregarious assertive active excitement cheerful
0.579 0.075 0.070 0.071 0.050 0.155
$`warm up quickly to others`
friendly gregarious assertive active excitement cheerful
0.151 0.063 0.232 0.242 0.152 0.160
$`feel comfortable around people`
friendly gregarious assertive active excitement cheerful
0.726 0.044 0.053 0.042 0.020 0.115
$`act comfortably around people`
friendly gregarious assertive active excitement cheerful
0.524 0.062 0.109 0.183 0.019 0.103
$`cheer people up`
friendly gregarious assertive active excitement cheerful
0.071 0.131 0.156 0.190 0.362 0.089
If you want to run transformer_scores
on additional
text, simply enter that text into the text
argument of the
function. The transformer models that you’ve used during your R session
will remain in R’s environment until you exit R or remove them from your
environment.
That’s it! You’ve successfully obtained sentiment analysis scores from the Cross-Encoder’s DistilRoBERTa transformer model. Now, go forth and quantify the qualitative!
image_scores
transforEmotion also provides a Facial Expression Recognition (FER) function that uses the CLIP model to obtain sentiment analysis scores from images. The CLIP model is a transformer model that was trained on a dataset of 400 million image-text pairs. CLIP’s input contains an image and text labels and the inference output is a score for each label. In case of emotion labels, the CLIP returns FER scores.
The image_scores
function uses the CLIP
model from Hugging Face. In a similar way to the
transformer_scores
function, the image_scores
function also downloads the CLIP model the first time you run it,
performs the inference and returns the output as an R dataframe.
The main input of this function is the path to image. It can be a local filepath or an URL pointin to an image. The second input is an array of emotion labels.
image <- ""https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa%2C_by_Leonardo_da_Vinci%2C_from_C2RMF_retouched.jpg/402px-Mona_Lisa%2C_by_Leonardo_da_Vinci%2C_from_C2RMF_retouched.jpg"
emotions <- c("excitement", "happiness", "pride", "anger", "fear", "sadness", "neutral")
image_scores(image, emotions)
This code runs FER on the Mona Lisa painting and returns the following output, a dataframe with 7 columns and 1 row.
excitement happiness pride anger fear sadness neutral
1 0.02142187 0.02024468 0.0604699 0.04037686 0.03273294 0.1061871 0.7185667
In case there is no face recognized on the image, the function will return an empty dataframe, preceeded by a warning message. We can test this with an image of three penguins from the South Shetland Islands.
image <- "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0e/Adelie_penguins_in_the_South_Shetland_Islands.jpg/640px-Adelie_penguins_in_the_South_Shetland_Islands.jpg"
image_scores(image, emotions)
This gives the following output:
We can use the workflow described in section 3 to run FER on videos. The
video_scores
function takes a video as input and returns a
dataframe with FER scores for each frame of the video.
The video can be a local filepath or an URL pointing to a YouTube video. In case of YouTUbe, the function will download the video and save it in the temporary directory. The function also takes an array of emotion labels as input. After downloading video, this function will extract frames from the video and the run the same workflow as in case of images.
Here’s an example of running FER on a YouTube video of Boris Johnson.
video_url <- "https://www.youtube.com/watch?v=hdYNcv-chgY&ab_channel=Conservatives"
emotions <- c("excitement", "happiness", "pride", "anger", "fear", "sadness", "neutral")
result <- video_scores(video_url, classes = emotions,
nframes = 10, save_video = TRUE,
save_frames = TRUE, video_name = 'boris-johnson',
start = 10, end = 120)
head(result)
Working with videos is more computationally complex. This example extracts only 10 frames from the video and I shouldn’t take longer than few minutes on an average laptop without GPU (depending on your internet connection needed to download the entire video and CLIP model). In research applicatons, we will usually extract 100-300 frames from the video. This can take much longer, so pantience is advised while waiting for the results.
Working with videos is more complex, so this function takes several
additional arguments. The nframes
argument specifies the
number of frames to extract from the video. The save_video
argument specifies whether to save the video in the temporary directory.
The save_frames
argument specifies whether to save the
extracted frames in the temporary directory. The video_name
argument specifies the name of the video. Saving the video and frames is
useful for purposes of quality control and potential evalution of the
result by human coders.
The start
and end
arguments specify the
start and end time of the video. The start
and
end
arguments are in seconds and point to the timestamps of
the video where the analysis should start and end. If they are not
provided, the function will try to process the entire video.
The result is a dataframe with 7 columns and 10 rows, one row for
each frame. The dataframe contains FER scores for each frame of the
video. The ouput of head(result)
looks like this:
excitement happiness pride anger fear sadness neutral
1 0.08960483 0.006041054 0.05632496 0.2259102 0.2781007 0.1757137 0.1683045
2 0.11524552 0.011083936 0.08131301 0.1672127 0.3321840 0.1652457 0.1277151
3 0.09541881 0.007240616 0.05629114 0.1665660 0.3410282 0.1952039 0.1382514
4 0.09860725 0.011296707 0.07909032 0.1693194 0.3010349 0.1759851 0.1646665
5 0.08856109 0.007197607 0.07237346 0.2261922 0.3237688 0.1515539 0.1303529
6 0.10022306 0.011431777 0.09256416 0.1467394 0.3202718 0.1574203 0.1713494
These 3 functions are the core of transforEmotion package. They allow you to run sentiment analysis on text, images and videos. If you run into any issues, please report them on the GitHub issues page. If you have any suggestions for improvements, please let us know. We are always looking for ways to improve the package and make it more useful for researchers.