# Creating ParticipantGroup objects

library(synr)

## Introduction

To start using synr, you must first convert your data into a ParticipantGroup object. This tutorial explains how to convert raw consistency test data into this object. For more information on synr and ParticipantGroup itself, please see the main tutorial.

synr offers separate methods for converting ‘long format’ and ‘wide format’ raw data into ParticipantGroup objects. Brief explanations of the data formats are included at the beginning of each section.

Note that if you have any missing data, e. g. if the response color hex codes for some participants are missing, those must be coded in the data frame as R NA values. If you have used other values to represent missingness, you must first replace those values with NA, e. g. by using the naniar package.

## ‘Long format’ data

‘Long format’ data adhere to the rule that there should only be one column for each variable/type of data. For consistency tests, this means that each trial should have one row in the data frame and there should be only one column for participant color responses and one for trial graphemes/symbols. You might also be familiar with the ‘long format’ from working with tidy data.

### Example data

Here’s an example of long formatted consistency test data.

synr_exampledf_long_small
#>    participant_id trial_symbol response_color response_time
#> 1               1            A         23F0BE           1.2
#> 2               1            7         99EECC           3.7
#> 3               1            D         001100           2.5
#> 4               1            D         9788DD           1.7
#> 5               1            A         1348CA           0.9
#> 6               1            7         173EF3           2.0
#> 7               2            7         AF7BE3           2.2
#> 8               2            D         FA3388           0.3
#> 9               2            A         5587FF          32.0
#> 10              2            A         0DABC5           8.0
#> 11              2            7         0454A5           6.6
#> 12              2            D         3FD1F8           0.1
#> 13              3            D         03EF88           2.5
#> 14              3            A         78AB33           9.9
#> 15              3            7         F03200           3.9
#> 16              3            7         000000           1.7
#> 17              3            D         FFFFFF           9.3
#> 18              3            A         4811ED           8.1

There were three participants. Each participant did 6 trials. Each trial is represented by a row, which holds the participant’s ID, the grapheme/symbol used, the participant’s response color (as an RGB hex code), and the time it took for the participant to respond after grapheme presentation. Note that response time data are optional - you can still use synr if you don’t have those.

### Convert data into a ParticipantGroup object

pg <- create_participantgroup(
raw_df=synr_exampledf_long_small,
n_trials_per_grapheme=2,
id_col_name="participant_id",
symbol_col_name="trial_symbol",
color_col_name="response_color",
time_col_name="response_time",
color_space_spec="Luv"
)

You need to pass in:

• A long-formatted data frame.
• How many trials were run for each grapheme/symbol.
• The name of participant ID, grapheme/symbol and response color columns.
• A string that specifies which color space you want to use (“XYZ”, “sRGB”, “Apple RGB”, “Lab”, or “Luv”) later, when doing calculations with synr.

If you want, you can also specify the name of a column of response times, if you have those.

## ‘Wide format’ data

‘Wide format’ data roughly adhere to the rule that there should only be one row for each subject/‘object of interest’. For consistency tests, this means that each participant has a single row in the data frame, and multiple columns for each trial.

### Example data

Here’s an example of wide formatted consistency test data.

synr_exampledf_wide_small
#>   participant_id symbol_1 response_color_1 response_time_1 symbol_2
#> 1              1        A           23F0BE             1.2        D
#> 2              2        7           99EECC             3.7        A
#> 3              3        D           001100             2.5        7
#>   response_color_2 response_time_2 symbol_3 response_color_3 response_time_3
#> 1           9788DD             1.7        7           AF7BE3             2.2
#> 2           1348CA             0.9        D           FA3388             0.3
#> 3           173EF3             2.0        A           5587FF            32.0
#>   symbol_4 response_color_4 response_time_4 symbol_5 response_color_5
#> 1        A           0DABC5             8.0        D           03EF88
#> 2        7           0454A5             6.6        A           78AB33
#> 3        D           3FD1F8             0.1        7           F03200
#>   response_time_5 symbol_6 response_color_6 response_time_6
#> 1             2.5        7           000000             1.7
#> 2             9.9        D           FFFFFF             9.3
#> 3             3.9        A           4811ED             8.1

There were three participants. Each participant did 6 trials. Each participant is represented by a row. Each trial is represented by three columns, e. g. symbol_1, response_color_1 and response_time_1 for the first trial. Note that response time data are optional - you can still use synr if you don’t have those.

### Convert data into a ParticipantGroup object

pg <- create_participantgroup_widedata(
raw_df=synr_exampledf_wide_small,
n_trials_per_grapheme=2,
participant_col_name="participant_id",
symbol_col_regex="symbol",
color_col_regex="colou*r",
time_col_regex="response_time",
color_space_spec="Luv"
)

You need to pass in:

• A wide-formatted data frame.
• The participant column’s name.
• The number of trials used per grapheme (defaults to 3)
• Regular expression patterns that are unique for names of
• trial columns that hold symbols/graphemes displayed to the participants.
• trial columns that hold participants’ response color RGB hex codes.
• A string that specifies which color space you want to use (“XYZ”, “sRGB”, “Apple RGB”, “Lab”, or “Luv”) later, when doing calculations with synr.

If you want, you can also specify a regular expression for response time columns, if you have those.

#### Details about regular expression patterns in example

The regular expression patterns, like ‘symb’ or ‘col’ must only occur in the corresponding columns. In the example data frame, only names of trial columns with color data have the pattern ‘col’ in them, so color_col_regex = 'col' would also work. If for instance the participant ID column had been called ‘p_id_column”, that column name would also fit the ’col’ pattern, and hence color_col_regex = 'col' wouldn’t work.

You can use as long or short regular expressions as you need. In this example we could for example have used color_col_regex = 'response_color_'.