Word documents generation

library(officer)
# Package `magrittr` makes officer usage easier.
library(magrittr)

Quick start

  1. Start with read_docx

Use the function read_docx to create an r object representing a Word document.

Initial Word file can be specified with argument path. If none is provided, this file will be an empty document located in the package directory. Formats and styles are defined in the initial file.

From the initial document, we will be able to use an object containing not only paragraph styles, character styles and table styles of the original document but also its content.

my_doc <- read_docx() 
styles_info(my_doc)
## # A tibble: 21 x 5
##    style_type       style_id             style_name is_custom is_default
##         <chr>          <chr>                  <chr>     <lgl>      <lgl>
##  1  paragraph         Normal                 Normal     FALSE       TRUE
##  2  paragraph         Titre1              heading 1     FALSE      FALSE
##  3  paragraph         Titre2              heading 2     FALSE      FALSE
##  4  paragraph         Titre3              heading 3     FALSE      FALSE
##  5  character Policepardfaut Default Paragraph Font     FALSE       TRUE
##  6      table  TableauNormal           Normal Table     FALSE       TRUE
##  7  numbering    Aucuneliste                No List     FALSE       TRUE
##  8  character         strong                 strong      TRUE      FALSE
##  9  paragraph       centered               centered      TRUE      FALSE
## 10      table  tabletemplate         table_template      TRUE      FALSE
## # ... with 11 more rows
  1. Add elements to document

By default new content is added at the end of the document. To understand how to add content at any location in the document, see section about cursor.

Let’s create an image from a plot…

src <- tempfile(fileext = ".png")
png(filename = src, width = 5, height = 6, units = 'in', res = 300)
barplot(1:10, col = 1:10)
dev.off()

… and add that image in the document and some new paragraphs of text and a table.

my_doc <- my_doc %>% 
  body_add_img(src = src, width = 5, height = 6, style = "centered") %>% 
  body_add_par("Hello world!", style = "Normal") %>% 
  body_add_par("", style = "Normal") %>% # blank paragraph
  body_add_table(iris, style = "table_template")
  1. Write the Word file

File can be generated using function print and argument target:

print(my_doc, target = "assets/docx/first_example.docx")

Download file first_example.docx - view with office web viewer

Adding elements

There are two types of functions for adding elements.

body_add_* functions

The paragraph is the main top container for content within a Word document. Note that tables are top container, they are at the same level than paragraphs. body_add_* functions are designed to add content as a top container: text as an entire paragraph, table, image, page break…

A title is a paragraph. To add a title, use body_add_par() with style argument pointing to corresponding title style.

Use function styles_info() to see available styles:

library(dplyr)
read_docx() %>% styles_info() %>% 
  dplyr::filter( style_type %in% "paragraph" )
## # A tibble: 10 x 5
##    style_type      style_id    style_name is_custom is_default
##         <chr>         <chr>         <chr>     <lgl>      <lgl>
##  1  paragraph        Normal        Normal     FALSE       TRUE
##  2  paragraph        Titre1     heading 1     FALSE      FALSE
##  3  paragraph        Titre2     heading 2     FALSE      FALSE
##  4  paragraph        Titre3     heading 3     FALSE      FALSE
##  5  paragraph      centered      centered      TRUE      FALSE
##  6  paragraph  graphictitle graphic title      TRUE      FALSE
##  7  paragraph    tabletitle   table title      TRUE      FALSE
##  8  paragraph           TM1         toc 1     FALSE      FALSE
##  9  paragraph           TM2         toc 2     FALSE      FALSE
## 10  paragraph Textedebulles  Balloon Text     FALSE      FALSE

It is important to understand that these style names are read in the initial file provided to read_docx. Few comments

library(ggplot2)
gg <- ggplot(data = iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point()

read_docx() %>% 
  body_add_par(value = "Table of content", style = "heading 1") %>% 
  body_add_toc(level = 2) %>% 
  body_add_break() %>% 

  body_add_par(value = "dataset iris", style = "heading 2") %>% 
  body_add_table(value = head(iris), style = "table_template" ) %>% 
  
  body_add_par(value = "plot examples", style = "heading 1") %>% 
  body_add_gg(value = gg, style = "centered" ) %>% 

  print(target = "assets/docx/body_add_demo.docx") %>% 
  invisible()

Download file body_add_demo.docx - view with office web viewer

slip_in_* functions

slip_in_* functions are designed to add content inside an existing paragraph: text, image or seq field. Element is inserted at beginning or end of the paragraph (pos=c('after', 'before')). Available functions are the following:

img.file <- file.path( Sys.getenv("R_HOME"), "doc", "html", "logo.jpg" )
read_docx() %>%
  body_add_par("R logo: ", style = "Normal") %>%
  slip_in_img(src = img.file, style = "strong", 
              width = .3, height = .3, pos = "after") %>% 
  slip_in_text(" - This is ", style = "strong", pos = "before") %>% 
  slip_in_seqfield(str = "SEQ Figure \u005C* ARABIC",
    style = 'strong', pos = "before") %>% 
  print(target = "assets/docx/slip_in_demo.docx") %>% 
  invisible()

Download file slip_in_demo.docx - view with office web viewer

These have been implemented mostly to let add Word numbering seq-fields at beginning of paragraphs used as reference entries (i.e. a table caption, a plot caption). See Section Table and image captions.

Cursor manipulation

A cursor is available and can be manipulated so that content can be added regarding to its position with body_add_* functions:

Cursor functions are the following:

In order to illustrate cursor functions, a document made of several paragraphs will be used (let’s use officer for that).

read_docx() %>%
  body_add_par("paragraph 1", style = "Normal") %>%
  body_add_par("paragraph 2", style = "Normal") %>%
  body_add_par("paragraph 3", style = "Normal") %>%
  body_add_par("paragraph 4", style = "Normal") %>%
  body_add_par("paragraph 5", style = "Normal") %>%
  body_add_par("paragraph 6", style = "Normal") %>%
  body_add_par("paragraph 7", style = "Normal") %>%
  print(target = "assets/docx/init_doc.docx" ) %>% 
  invisible()

Download file init_doc.docx - view with office web viewer

Now, let’s use init_doc.docx with read_docx and manipulate its content with cursor functions.

doc <- read_docx(path = "assets/docx/init_doc.docx") %>%

  # default template contains only an empty paragraph
  # Using cursor_begin and body_remove, we can delete it
  cursor_begin() %>% body_remove() %>%

  # Let add text at the beginning of the
  # paragraph containing text "paragraph 4"
  cursor_reach(keyword = "paragraph 4") %>%
  slip_in_text("This is ", pos = "before", style = "Default Paragraph Font") %>%

  # move the cursor forward and end a section
  cursor_forward() %>%
  body_add_par("The section stop here", style = "Normal") %>%
  body_end_section(landscape = TRUE, continuous = FALSE) %>%

  # move the cursor at the end of the document
  cursor_end() %>%
  body_add_par("The document ends now", style = "Normal")

print(doc, target = "assets/docx/cursor.docx") %>% 
  invisible()

Download file cursor.docx - view with office web viewer

Remove content

The function body_remove let remove content from a Word document. This function used with cursor_* functions is a convenient tool to update an existing document.

For illustration purpose, we will generate a document that will be used as initial document later when showing how to use body_remove.

library(officer)
library(magrittr)

str1 <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit. " %>% 
  rep(20) %>% paste(collapse = "")
str2 <- "Drop that text" 
str3 <- "Aenean venenatis varius elit et fermentum vivamus vehicula. " %>% 
  rep(20) %>% paste(collapse = "")

my_doc <- read_docx()  %>% 
  body_add_par(value = str1, style = "Normal") %>% 
  body_add_par(value = str2, style = "centered") %>% 
  body_add_par(value = str3, style = "Normal") 

print(my_doc, target = "assets/docx/ipsum_doc.docx") %>% invisible()

File ipsum_doc.docx now exists and contains a paragraph containing text that text. In the following example, we will position the cursor on that paragraph and then delete it:

my_doc <- read_docx(path = "assets/docx/ipsum_doc.docx")  %>% 
  cursor_reach(keyword = "that text") %>% 
  body_remove()

print(my_doc, target = "assets/docx/ipsum_doc.docx") %>% invisible()

The text search is made via xpath 1.0 and regular expressions are not supported.

Download file ipsum_doc.docx - view with office web viewer

Replace content

Functions body_add_* let replace content in a Word document.

For illustration purpose, we will generate a document that will be used as initial document later.

my_doc <- read_docx()  %>% 
  body_add_par(value = str1, style = "Normal") %>% 
  body_add_par(value = str2, style = "centered") %>% 
  body_add_par(value = str3, style = "Normal") 

print(my_doc, target = "assets/docx/replace_template.docx")

File replace_template.docx now exists and contains a paragraph containing text that text. In the following example, we will position the cursor on that paragraph and then replace it. Using pos = "on" will replace content where cursor is by new content.

my_doc <- read_docx(path = "assets/docx/replace_template.docx")  %>% 
  cursor_reach(keyword = "that text") %>% 
  body_add_par(value = "This is a new paragraph.", style = "centered", pos = "on")

print(my_doc, target = "assets/docx/replace_doc.docx")

Download file replace_doc.docx - view with office web viewer

Sections

Sections can be added to a document. This is possible by using function body_end_section. The default section can be modified with function body_default_section.

A section start at the end of the previous section (or the beginning of the document if no preceding section exists), it stops where the section is declared.

str1 <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit. " %>% 
  rep(30) %>% paste(collapse = "")
str2 <- "Aenean venenatis varius elit et fermentum vivamus vehicula. " %>% 
  rep(30) %>% paste(collapse = "")

my_doc <- read_docx()  %>% 
  slip_in_text(str = str1, style = "strong") %>% 
  body_add_par(value = str2, style = "centered") %>% 
  break_column_before() %>% 
  body_end_section(continuous = TRUE, 
                   colwidths = c(.6, .4), space = .05, sep = FALSE) %>%
  body_add_par(value = str3, style = "Normal") 
print(my_doc, target = "assets/docx/section.docx") %>% invisible()

Download file section.docx - view with office web viewer

In the previous example, first two paragraphs will be in a 2 columns section and the third will be in a default section.

Table and image captions

Man can combine slip_in_seqfield and slip_in_text to prefix a paragraph with references (i.e. chapter number and graphic index in the document). However, producing a plot or a table and its caption can be verbose.

Shortcuts functions are implemented in the object shortcuts (it will at least give you a template of code to modify if it does not fit exactly your need). slip_in_tableref, slip_in_plotref and body_add_gg can make life easier.

Below an illustration:

library(magrittr)
library(officer)
library(ggplot2)

gg1 <- ggplot(data = iris, aes(Sepal.Length, Petal.Length)) + geom_point()
gg2 <- ggplot(data = iris, aes(Sepal.Length, Petal.Length, color = Species)) + geom_point()


doc <- read_docx() %>% 
  body_add_par(value = "Table of content", style = "heading 1") %>% 
  body_add_toc(level = 2) %>% 
  
  body_add_par(value = "Tables", style = "heading 1") %>% 
  body_add_par(value = "dataset mtcars", style = "heading 2") %>% 
  body_add_table(value = head(mtcars)[, 1:4], style = "table_template" ) %>% 
  body_add_par(value = "data mtcars", style = "table title") %>% 
  shortcuts$slip_in_tableref(depth = 2) %>%
  
  body_add_par(value = "dataset iris", style = "heading 2") %>% 
  body_add_table(value = head(iris), style = "table_template" ) %>% 
  body_add_par(value = "data iris", style = "table title") %>% 
  shortcuts$slip_in_tableref(depth = 2) %>%
  
  body_end_section(continuous = FALSE, landscape = FALSE ) %>% 
  
  body_add_par(value = "plot examples", style = "heading 1") %>% 
  body_add_gg(value = gg1, style = "centered" ) %>% 
  body_add_par(value = "graph example 1", style = "graphic title") %>% 
  shortcuts$slip_in_plotref(depth = 1) %>%
  
  body_add_par(value = "plot 2", style = "heading 2") %>% 
  body_add_gg(value = gg2, style = "centered" ) %>% 
  body_add_par(value = "graph example 2", style = "graphic title") %>% 
  shortcuts$slip_in_plotref(depth = 2) %>%
  
  body_end_section(continuous = FALSE, landscape = TRUE) %>% 
  
  body_add_par(value = "Table of tables", style = "heading 2") %>% 
  body_add_toc(style = "table title") %>% 
  body_add_par(value = "Table of graphics", style = "heading 2") %>% 
  body_add_toc(style = "graphic title")

print(doc, target = "assets/docx/toc_and_captions.docx") %>% invisible()

Download file toc_and_captions.docx - view with office web viewer