uchardet

GitLab CI Build Status AppVeyor Build status Codecov Code Coverage CRAN Status License: GPL v2

R bindings for uchardet library, that is the encoding detector library of Mozilla. It takes a sequence of bytes in an unknown character encoding without any additional information, and attempts to determine the encoding of the text and returns encoding names in the iconv-compatible format.

Key features:

Installation

To install the package from the CRAN run the following command:

install.packages("uchardet", repos = "https://cloud.r-project.org/")

Also you could install the dev-version with the install_gitlab() function from the remotes package:

remotes::install_gitlab("artemklevtsov/uchardet@devel")

This package contains the compiled code, therefore you have to use the Rtools to install it on Windows.

Installation from source requires uchardet library and headers. On Linux or OSX the configure script try to find it with pkg-config or system include/library paths. You can define include and library paths with UCHARDET_INCLUDES and UCHARDET_LIBS configure variables.

If the uchardet system library is not found it will be compiled from source. You can force the compilation of the builtin library with the --with-builtin-uchardet configure argument.

Example

# load packages
library(uchardet)

# detect string encoding
ascii <- "Hello, useR!"
print(ascii)
#> [1] "Hello, useR!"
detect_str_enc(ascii)
#> [1] "ASCII"
utf8 <- "\u4e0b\u5348\u597d"
print(utf8)
#> [1] "下午好"
detect_str_enc(utf8)
#> [1] "UTF-8"

# detect raw vector encoding
detect_raw_enc(charToRaw(ascii))
#> [1] "ASCII"
detect_raw_enc(charToRaw(utf8))
#> [1] "UTF-8"

# detect file encoding
ascii_file <- tempfile()
writeLines(ascii, ascii_file)
detect_file_enc(ascii_file)
#> /tmp/Rtmpft7RBl/file625e2e4efaf3 
#>                          "ASCII"
utf8_file <- tempfile()
writeLines(utf8, utf8_file)
detect_file_enc(utf8_file)
#> /tmp/Rtmpft7RBl/file625e5aae6475 
#>                          "UTF-8"

Bug reports

Use the following command to go to the page for bug report submissions:

bug.report(package = "uchardet")

Before reporting a bug or submitting an issue, please do the following:

Please attach traceback() and sessionInfo() output to bug report. It may save a lot of time.

License

The uchardet package is distributed under GPLv2 license.