Multivariate Sex Differences in Personality with Regularized Regression in multid -package

Ville-Juhani Ilmarinen


This vignette details the procedures by which multivariate sex differences in personality traits can be estimated by using regularized logistic regression and gender diagnosticity approach with multid package.

This approach can be used for regularized measurement of multivariate differences between any two groups, but in this example, differences between females and males in “answers to the Big Five Personality Test, constructed with items from the International Personality Item Pool” (dataset that can be obtained from are examined.

The predictive approach as implemented with regularized methods allows for examination of group-membership probabilities and their distributions across individuals. In the context of statistical predictions of sex, these distributions are an updated variant to gender-typicality distributions used in gender diagnosticity methodology (Lippa & Connelly, 1990).

For more details and empirical examples of the integration of multivariate sex difference estimation and gender diagnosticity methodology, see Ilmarinen, Vainikainen, and Lönnqvist (2022) and Lönnqvist and Ilmarinen (2021).



You can install the released version of multid from CRAN with:


You can install the development version from GitHub with:



In this vignette, openly available data “Answers to the Big Five Personality Test, constructed with items from the International Personality Item Pool” from will be used. The dataset is directly downloaded from the page (by this approach and therefore each step presented in the following script is fully reproducible starting from data download.

# create a temporary directory
td <- tempdir()

# create a temporary file
tf <- tempfile(tmpdir=td, fileext=".zip")

# download file from internet into temporary location
download.file("", tf)

# list zip archive
file_names <- unzip(tf, list=TRUE)

# extract files from zip file
unzip(tf, exdir=td, overwrite=TRUE)

# use when zip file has only one file
dat <- import(file.path(td, "BIG5/data.csv"))

# delete the files and directories

Variable transformations

# save names of personality items to a vector

# code item response 0 (zero) to not available (NA) on all these items


# check that the numerical range makes sense

#> [1] 1 5
# calculate sum scores for Big Five (reverse coded items subtracted from 6)
# see codebook from for item content and keys






For the purpose of this example, only US responses will be analyzed for sex differences. Furthermore, only participants who reported being male (gender=1) or female (gender=2), and who did not have any missing responses are included.

  na.omit(dat[dat$country=="US" & (dat$gender==1 | dat$gender==2),

# code categorical gender variables

Sex differences

In this section, several different ways for estimating sex differences in personality are demonstrated. We start with typical univariate differences in the broad Big Five domains, followed by multivariate differences in both domains and using personality items as a multivariate set. Special attention is given to regularized methods because sex differences and the distributions underlying these mean differences can be both examined from this approach.

Univariate differences in trait domain sum scores using the d_pooled_sd -function

d_pooled_sd -function is used to calculate Cohen’s d with pooled standard deviation across the male and female groups for each Big Five trait separately. The arguments include the analyzed Big Five trait (var), sex as the binary grouping variable (group.var), the values that represents these groups (group.values), and logical for whether statistical inferences is wanted (infer). All arguments except statistical inference are also included in the multivariate -function D_regularized presented further below.

d.N<-d_pooled_sd(data = US.dat,var = "N",
            group.var = "",group.values = c("Male","Female"),

d.E<-d_pooled_sd(data = US.dat,var = "E",
            group.var = "",group.values = c("Male","Female"),

d.O<-d_pooled_sd(data = US.dat,var = "O",
            group.var = "",group.values = c("Male","Female"),

d.A<-d_pooled_sd(data = US.dat,var = "A",
            group.var = "",group.values = c("Male","Female"),

d.C<-d_pooled_sd(data = US.dat,var = "C",
            group.var = "",group.values = c("Male","Female"),

# compile to a table


#>   n.Male n.Female m.Male m.Female sd.Male sd.Female  diff     D
#> N   2945     5739   2.79     3.16    0.88      0.86      0.87 -0.37 -0.42
#> E   2945     5739   3.00     3.06    0.96      0.96      0.96 -0.07 -0.07
#> O   2945     5739   4.07     3.88    0.58      0.63      0.62  0.18  0.30
#> A   2945     5739   3.68     4.03    0.75      0.68      0.71 -0.35 -0.50
#> C   2945     5739   3.38     3.44    0.72      0.75      0.74 -0.07 -0.09
#>   t_stat      df p
#> N -18.45 5845.38 0
#> E  -3.09 5919.79 0
#> O  13.44 6458.37 0
#> A -21.33 5455.81 0
#> C  -3.95 6119.69 0

Column D indicates Cohen’s d. Females score higher in Neuroticism (N; d = -0.42) and Agreeableness (A; d = -0.50), while Males score higher in Openness (O; d = 0.30). Differences in Extraversion (E) and Conscientiousness (C), albeit statistically significant (p < .001), seem negligible in magnitude (d < |.10|) .

The average absolute difference across the Big Five traits is therefore: d = 0.27. Importantly, this average absolute difference is not informative about the overall distance between males and females. One should use multivariate methods for examining questions regarding the overall distance in certain domain, like personality.

Other columns in the output are:

Multivariate difference using trait sum scores as variables

Although the univariate differences presented above estimate sex difference in each broad Big Five trait separately, these single Cohen’s d estimates do not provide a clear picture about the overall distance between males and females. Multivariate methods, using multiple variables and differences in these variables at the same, while accounting for their covariability (intercorrelations), should be used for understanding overall distances. These D values can be interpreted in the same way as Cohen’s d, i.e., standardized distances between males and females.

Mahalanobis’ D

Mahalanobis’ D is the standardized distance between the Male and Female group centroids in multivariate space. For calculation of Mahalanobis’ D, correlation matrix of the multivariate set and vectors of standardized mean differences are needed. The ordering of the variables should be same in these inputs.

# let's use the previously obtained univariate d-values
# calculate intercorrelations
# test that the ordering of these variables matches
# print the correlation matrix
#>       N     E     O     A     C
#> N  1.00 -0.25 -0.10 -0.10 -0.26
#> E -0.25  1.00  0.15  0.31  0.10
#> O -0.10  0.15  1.00  0.10  0.05
#> A -0.10  0.31  0.10  1.00  0.17
#> C -0.26  0.10  0.05  0.17  1.00
# Calculate D
D_maha<-sqrt(t(d_vector) %*% solve(cor_mat) %*% d_vector)
#>           [,1]
#> [1,] 0.7681502

Mahalanobis’ D is, of course, larger than any of the univariate d’s for single variables, here the standardized distance between the group centroids of males and females in the multivariate space is D = 0.77

Regularized D with D_regularized -function

Regularized variant of D is calculated using a predictive approach whereby the binary sex-variable is predicted by the multivariate variable set with regularized logistic regression. The predicted values for each individual are indicative of logits of probabilities of being a male or female.

These femininity-masculinity scores (FM) can be used in similar ways to any other variable to calculate standardized distances between mean FM of Males and mean FM of females with pooled standard deviations. Beyond estimation of mean differences, there are many other interesting features that can be informative of the multivariate/FM distributions.

In the D_regularized -function, separate partitions of the original data are used for regulation (i.e., training) and prediction (i.e., testing). Regularization means that the beta-coefficients for variables in the multivariate set are penalized through a specified function (here, a default elastic net penalty with 10-fold cross-validation is used). After this, the obtained coefficients weights are used in the prediction part of the data as measures of FM for each individual. Argument out (logical, meaning are predictions made “out-of-bag”) and size (size of the training partition of the data, equal for males and females) are used for informing the function that a partition is wanted. The multivariate set is supplied with mv.vars -argument.

By requesting, it is possible to calculate pcc (probability of correctly classified), auc (area under the receiver operating characteristics), and also frequencies indicating the proportions of the sample that fall within certain cutoffs of predicted probabilities from 0 to 1 for each group (pred.prob).

# set seed number for reproducibility

# decide the size of the regularization (training data fold)
# here, half of the smaller of the male and female groups is used

# run D_regularized
                group.values = c("Male","Female"),
                out = T,
                pcc = T,
                auc = T,
                pred.prob = T)

# print the output
#>      n.Male n.Female m.Male m.Female sd.Male sd.Female diff    D
#> [1,]   1473     4267   0.34     -0.3    0.86      0.79      0.81 0.64 0.79
#>      pcc.Male pcc.Female  auc
#> [1,]     0.63       0.67      0.66 0.71

From the output, we see that the sample sizes are smaller as compared to univariate estimates, because the sample sizes are only indicative of the samples in prediction/testing dataset.

Standardized multivariate difference between males and females was estimated at D = 0.79. Very similar sex difference estimate to Mahalanobis’ D.

There are additional features in this output that might be of interest. Similarly to univariate d estimates, we get descriptive statistics for both groups on FM-score metric, as well as an estimate of pooled estimate of standard deviation that is also used for calculation of D. pcc indicates probability of correctly classified individuals, and auc is the area under the receiver operating characteristics. These are both commonly used indices of predictive performance. In this exemplary case, the performance of broad Big Five domains in predicting sex. Detailing predicted probabilities of being a male across certain cutoffs (defaults were used in this example) for both groups can be obtained from P.table part of the output:

#>          [0,0.2) [0.2,0.4) [0.4,0.6) [0.6,0.8) [0.8,1]
#>   Female     7.2      39.0      37.0      14.1     2.7
#>   Male       1.2      18.1      37.1      32.6    11.0

From this table we see that majority of males are nevertheless not very clearly classified as males based on the Big Five personality traits (11% in .80-1 range), and same is true for classifying females as females (7% in 0-0.20 range).


The same pattern can also be observed from visually inspecting the distributions of the FM-scores, and assessing the overlap between the distributions of FM-scores of males and females.

# Obtain the D-estimate

# Overlap relative to a single distribution (OVL)

# Proportion of overlap relative to the joint distribution

# non-parametric overlap with overlapping package
#> Loading required package: ggplot2
#> Loading required package: testthat
  overlap(x = list(D_out$pred.dat[

# this corresponds to Proportion of overlap relative to the joint distribution (OVL2)

# from which Proportion of overlap relative to a single distribution (OVL) is approximated at

# comparison of different overlap-estimates
#>       OVL np.OVL OVL2 np.OVL2
#> [1,] 0.69    0.7 0.53    0.54

In this example, the non-parametric overlap (np) estimates to these logistic distributions are almost identical to the parametric overlap estimates.

Bootstrap confidence intervals for Regularized D

To obtain confidence intervals for D-estimates in the regularized techniques, a bootstrap approach can be used whereby D is estimated with multiple redrawn bootstrap samples from the original sample and using the distribution of these D values for constructing a percentile confidence interval around the estimate. In this example, we take only 10 redraws with replacement, but in reality one should take at least 1000 or 5000.

# number of redraws

# placeholder for D-estimates

# repeat the procedure to obtain a distribution


for (i in 1:reps){
  # draw replaced samples with the same size as the original data
  # run D_regularized for each sample
              group.values = c("Male","Female"),
              out = T,
#> Time difference of 25.29923 secs
# print 95% CI percentile interval

#>  2.5% 97.5% 
#>  0.78  0.83

Thus, it can be reported that the multivariate sex difference D = 0.79, 95% CI [0.78, 0.83].

Multivariate difference using item scores as variables

Recently there have been multiple studies in the field of personality psychology highlighting the importance of narrow personality traits, also in terms of their ability to predict criteria over and above the more commonly used broad trait domains, such as Big Five (e.g., Seeboth & Mõttus, 2018). These narrow personality traits are usually either facets or nuances that are commonly modeled as being part of a certain Big Five trait, although their unique predictive utility suggests that they have trait-like properties (stability, cross-informant agreement etc.) that go beyond the broad Big Five domains (e.g., Mõttus et al., 2019). That is, the narrower traits can be considered unique traits as well. It is therefore interesting to see whether the item scores as the variable set in multivariate estimation would also better predictions of sex, and different estimates of sex differences (see Ilmarinen et al., 2022).

Because the number of the variables is increased (from five to fifty in this dataset), the use of regularization methods and different data partitions for training and testing are strongly recommended to avoid overfitting the data (for illustrative purposes, Mahalanobis’ D is also be calculated below).

Mahalanobis’ D

# place-holder vector for univariate Cohen d's for items

# loop through the items and obtain d for each item

for (i in 1:length(per.items)){


# Calculate D
  sqrt(t(d_vector_items) %*% solve(cor_mat_items) %*% d_vector_items)
#>           [,1]
#> [1,] 0.9212846

Item-based Mahalanobis’ D is somewhat larger that any of those based on Big Five domains. The standardized distance between the group centroids of males and females in a space comprised of personality items is D = 0.92

Regularized D with D_regularized -function

It is straightforward to use D_regularized -function even with large variable sets (especially when the variable names are save in a vector first). Using separate fold for regularization and prediction for items. Domain-based results are printed for comparison.

              group.values = c("Male","Female"),
              out = T,
              size=size,pcc = T,auc = T,pred.prob = T)

# Item-based D
#>      n.Male n.Female m.Male m.Female sd.Male sd.Female diff    D
#> [1,]   1473     4267   0.46    -0.46       1      0.95      0.96 0.92 0.96
#>      pcc.Male pcc.Female  auc
#> [1,]     0.66       0.71      0.69 0.75
# Big Five domain-based D
#>      n.Male n.Female m.Male m.Female sd.Male sd.Female diff    D
#> [1,]   1473     4267   0.34     -0.3    0.86      0.79      0.81 0.64 0.79
#>      pcc.Male pcc.Female  auc
#> [1,]     0.63       0.67      0.66 0.71

With items as the multivariate set, D = 0.96 is slightly larger than with Big Five trait domains, D = 0.79.

Differences in predictive performance can also be illustrated from comparing the predicted probabilities.

#>          [0,0.2) [0.2,0.4) [0.4,0.6) [0.6,0.8) [0.8,1]
#>   Female    15.9      37.8      29.4      13.7     3.3
#>   Male       1.9      17.9      31.4      31.1    17.7
#>          [0,0.2) [0.2,0.4) [0.4,0.6) [0.6,0.8) [0.8,1]
#>   Female     7.2      39.0      37.0      14.1     2.7
#>   Male       1.2      18.1      37.1      32.6    11.0

We see that there are more people in the extremes, but based on the pcc and auc -values printed above, the overall predictive performance is not drastically improved. A large proportion (more than 30%) of the data is still classified in the 40%-60% range, meaning their personality profiles are rather indifferent in regards to sex.


Overlaps give similar information but in another format:

# Obtain the D-estimate

# Overlap relative to a single distribution (OVL)

# Proportion of overlap relative to the joint distribution

# non-parametric overlap

  overlap(x = list(D_items_out$pred.dat[

# this corresponds to Proportion of overlap relative to the joint distribution (OVL2)

# from which Proportion of overlap relative to a single distribution (OVL) is approximated at

# compare the different overlap-estimates

#>      OVL_items np.OVL_items OVL2_items np.OVL2_items
#> [1,]      0.63         0.65       0.46          0.48
# print Big Five domain based overlaps for comparison

#>       OVL np.OVL OVL2 np.OVL2
#> [1,] 0.69    0.7 0.53    0.54

There overlap between FM-distributions of males and females is somewhat smaller when individual items are used as the multivariate set rather than broad Big Five domains.


Ilmarinen, V. J., Vainikainen, M. P., & Lönnqvist, J. E. (2022). Is there a g-factor of genderedness? Using a continuous measure of genderedness to assess sex differences in personality, values, cognitive ability, school grades, and educational track. European Journal of Personality.

Lippa, R. A., & Connelly, S. (1990). Gender diagnosticity: A new Bayesian approach to gender-related individual differences. Journal of Personality and Social Psychology, 59(5), 1051–1065.

Lönnqvist, J. E., & Ilmarinen, V. J. (2021). Using a continuous measure of genderedness to assess sex differences in the attitudes of the political elite. Political Behavior, 43, 1779–1800.

Mõttus, R., Sinick, J., Terracciano, A., Hřebíčková, M., Kandler, C., Ando, J., Mortensen, E. L., Colodro-Conde, L., & Jang, K. L. (2019). Personality characteristics below facets: A replication and meta-analysis of cross-rater agreement, rank-order stability, heritability, and utility of personality nuances. Journal of Personality and Social Psychology, 117(4), 35–50.

Seeboth, A., & Mõttus, R. (2018). Successful explanations start with accurate descriptions: Questionnaire items as personality markers for more accurate predictions. European Journal of Personality, 32(3), 186—201.

Session Information

#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19043)
#> Matrix products: default
#> locale:
#> [1] LC_COLLATE=C                     LC_CTYPE=Finnish_Finland.utf8   
#> [3] LC_MONETARY=Finnish_Finland.utf8 LC_NUMERIC=C                    
#> [5] LC_TIME=Finnish_Finland.utf8    
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> other attached packages:
#> [1] overlapping_1.7 testthat_3.1.4  ggplot2_3.3.5   multid_0.6.0   
#> [5] rio_0.5.29     
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8.3      lattice_0.20-45   glmnet_4.1-4      digest_0.6.29    
#>  [5] foreach_1.5.2     utf8_1.2.2        R6_2.5.1          cellranger_1.1.0 
#>  [9] plyr_1.8.7        evaluate_0.15     highr_0.9         pillar_1.7.0     
#> [13] rlang_1.0.2       curl_4.3.2        readxl_1.4.0      rstudioapi_0.13  
#> [17] data.table_1.14.2 jquerylib_0.1.4   Matrix_1.4-1      rmarkdown_2.14   
#> [21] labeling_0.4.2    splines_4.2.0     stringr_1.4.0     foreign_0.8-82   
#> [25] munsell_0.5.0     compiler_4.2.0    xfun_0.30         pkgconfig_2.0.3  
#> [29] shape_1.4.6       htmltools_0.5.2   tidyselect_1.1.2  tibble_3.1.7     
#> [33] codetools_0.2-18  fansi_1.0.3       crayon_1.5.1      dplyr_1.0.9      
#> [37] withr_2.5.0       brio_1.1.3        grid_4.2.0        jsonlite_1.8.0   
#> [41] gtable_0.3.0      lifecycle_1.0.1   magrittr_2.0.3    pROC_1.18.0      
#> [45] scales_1.2.0      zip_2.2.0         cli_3.3.0         stringi_1.7.6    
#> [49] farver_2.1.0      bslib_0.3.1       ellipsis_0.3.2    generics_0.1.2   
#> [53] vctrs_0.4.1       openxlsx_4.2.5    iterators_1.0.14  tools_4.2.0      
#> [57] forcats_0.5.1     glue_1.6.2        purrr_0.3.4       hms_1.1.1        
#> [61] fastmap_1.1.0     survival_3.3-1    yaml_2.3.5        colorspace_2.0-3 
#> [65] knitr_1.39        haven_2.5.0       sass_0.4.1