library(rethnicity) #> ══ WARNING! ════════════════════════════════════════════════════════════════════ #> The method provided by this package has its limitations and anyone must use #> them cautiously and responsibly. You should also agree that you only intend to #> use the method for academic research purpose and not for commercial use. You #> would also agree NOT to discriminate anyone based on race or color or any #> characteristic, with the information provided by this package. Please refer to #> the documentation for details: #> https://fangzhou-xie.github.io/rethnicity/index.html #> ════════════════════════════════════════════════════════════════════════════════
I built this package to help applied researchers for research on ethnic equality/inequality. More specifically, this package provides a race-prediction method based on names. I designed the package in such way that the method is empowered by deep learning models, without the need to install the deep learning libraries, the installations of which are usually a daunting task. Hence, the methods provided in this package are not designed to be updated/fine-tuned/trained on custom datasets. This is the trade-off one has to be willing to make for the ease of use.
That said, from version
0.2.0 onwards, I provide two additional lower-level functions:
predict_lastname, which would allow users to provided their customized models. (There is only one function prior to
predict_ethnicity. This function is still the RECOMMENDED one to use for most people.)
Since the package disables training by design, you need to train your own model in Keras and then convert the trained model to
.json format by the frugally-deep project.
If you are reading this vignette, most likely you know what you are doing and you must have heard
Keras. Otherwise, you will have to stick to the default method
Before training the model, you need to process your dataset and you will need to use
keras.utils.to_categorical() to transform the outcome variable into integers and you need to know the mapping between them. For example,
0, 1, 2, 3 refer to
asian, black, hispanic, white respectively. You will need this and we will call it
labels = c("asian", "black", "hispanic", "white").
Just remember to save the model without the optimizers (more on the
Then, use the
convert_model.py script to convert your model into
.json format. This is what I did as well. You will encounter an error in the conversion process, if you include the optimizers in the saved model.
python convert_model.py keras_model.h5 keras_model.json
Now you have the model trained and converted and you need the file path of this model file. I am loading the default models without training new ones.
# remember the list of labels we mentioned? <- c("asian", "black", "hispanic", "white") labels # change to your own model file path <- system.file("models", "fullname_aligned_distill.json", package = "rethnicity", mustWork = TRUE) model_path # run the prediction predict_fullname(firstnames = "Alan", lastnames = "Turing", labels = labels, model_path = model_path) #> firstname lastname prob_asian prob_black prob_hispanic prob_white race #> 1 Alan Turing 0.02842531 0.2051059 0.02074103 0.7457278 white
In fact, if you tweak the code to predict gender from names, this will also work.