How to create Word docs from R or Python with Quarto

There are several ways to create a Word document from programming languages, including R Markdown and the officer package with R and the python-docx library in Python. But one of the newest and more intriguing is Quarto, a free, open source technical publishing system from RStudio (now Posit) that’s native to Python and Julia as well as R.

One of the big advantages of Quarto is that, unlike a Word-specific package, the same Quarto file with minor tweaks can be used to generate dozens of output formats in addition to Word, including PowerPoint, HTML, and PDF.  (Find out more: “What is Quarto? RStudio rolls out next-generation R Markdown”.) In addition, you can automate the creation of Word reports and include results of your analysis and visualization code.

Here’s how to use Quarto to create Word documents.

Step 1: Install Quarto

Because Quarto isn’t a language-specific library, you install it like any other stand-alone software. You can find binary downloads for Windows, macOS, and Linux on Quarto’s “Get Started” page.

If you’re an R user and you have an up-to-date version of RStudio, Quarto should be included by default. You don’t need to install Quarto separately.

If you want to use Quarto in Visual Studio Code, install the Quarto extension in addition to the Quarto application software. To render Quarto documents that include Python code, my system also instructed me to install Jupyter Notebook by running python3 -m pip install jupyter.

You can create and render Quarto files with any plain text editor and your terminal, just as you can with R or Python scripts, since they are plain text and not binary files. However, you’d miss out on all of the built-in tools of an IDE, such as code completion suggestions and a render button.

Step 2: Create a Quarto document file

Once you’ve got Quarto installed, you can create a new Quarto file in your IDE the usual way, either File > New File > Quarto Document (not Quarto Presentation) in RStudio, or File > New File in VS Code and choose “Quarto” as the language.

In RStudio, you’ll have a choice of a few Quarto document output formats. Select Word, and you can then either auto-generate a Word sample document or a blank doc. It can be helpful until you’re familiar with Quarto syntax to see what the sample looks like.

Document text including the YAML header --- title: 'My Quarto Word Doc Sample' format: docx editor: Screen shot by Sharon Machlis

Sample Quarto document generated by RStudio when selecting Word output.

The default YAML header in RStudio includes a title, output format (in this case docx for Word), and editor (visual WYSIWYG or source).

If you’re starting with a blank document in VS Code, you can add the basic YAML header at the top:

title: "Your document title"
format: docx

As far as I know, there is no WYSIWYG Quarto editor in VS Code, so there is no reason to specify an editor.

Then start creating your content.

Step 3: Add text with Markdown syntax

Quarto uses Pandoc’s version of Markdown syntax for writing text. That includes single underscores around text you want in italics, double asterisks for text you want to bold, blank lines between paragraphs, two or more spaces at the end of a line to create a line break, and hash symbols at the start of a line to signify header font size. A single hash indicates the largest font size, h1; two is the second largest, h2; and so on.

Step 4 (optional): Style your document from a reference .docx

Some CSS-based document styling designed for Quarto HTML output formats won’t work when exporting to Word. However, you can create a separate reference style Word document with font styles, sizes, and such for your document.

The code below should be run in your terminal (not R or Python console) to create a default Word styling document, in this example called my_doc_style.docx (you can call it anything):

quarto pandoc -o my-doc-style.docx \
   --print-default-data-file reference.docx

This creates a regular Word .docx file, not a Microsoft Word .dotx template. You can open your reference .docx and customize its styles as with any Word document by opening the Styles panel from the Word ribbon.

To use the template in a Quarto doc, add it to the document’s YAML header with syntax like this:

    reference-doc: my-doc-style.docx

There are other customizations available for Quarto Word documents, such as adding a table of contents or section numbering, which you can see in the Quarto documentation for Word

Step 5: Add results of R or Python code to your Word doc

One of the best things about generating a Word doc from R or Python is the ability to run code and add results to your document—including graphics.

You do this by adding code chunks to your Quarto file, which are set off by three backticks, like this for R:

# R code here

or this for Python:

# Python code here

You can set options for a code chunk such as whether to display the code (echo), run the code (eval), show code warning messages, and so on. Chunk options start off with #| (often referred to as a “hash pipe”) for R, Python, or Julia.

The chunk options below would show results of R code in a chunk but not display the code in the Word doc:

#| echo: false
#| eval: true
# R code here

Other options include #| fig-cap: My caption for a figure caption, #| warning: false to not display any warning messages when code runs, and #| cache: true to cache results of a compute-intensive chunk where data won’t change.

You can execute code within the figure caption option by using !expr with syntax such as 

#| fig-cap: !expr paste("Data pulled on"Sys.Date())

Step 6: Render the document

You can render a Quarto document in RStudio or VS Code by using the Render button, the keyboard shortcut Ctrl/Cmd + Shift + K or do so with the terminal command 

quarto render my_quarto_document.qmd --to docx

for a document named my_quarto_document.

R users can also use the quarto R package’s command


Note: Occasionally, the initial Word document preview that popped up from RStudio in early versions didn’t always display my graph. That seems to be fixed. However, if that happens to you, try duplicating the initial .docx file as a new, editable Word document, since that fixed the issue for me.

Step 7 (optional): Automate multiple versions with parameters

Being able to create Word files with results of your code is useful not only for single-time documents. It also lets you streamline regular data reporting and updates with code that pulls new data from an external source, runs new calculations, and generates up-to-date graphs with a single render call.

But Quarto also has the ability to add parameters to a report, which are like variables defined externally during rendering. That lets you use a report as a template and create the same report for different parameters like cities or regions. For example, if you needed to run one report for each of 10 cities, city could be defined as a parameter in your document’s YAML header, such as

title: "My Quarto Document"
city: New York

That sets a parameter named city with a default value of New York. You can then access the value of the city parameter in your R code with params$city, such as 

#| echo: false
cat("This report is about", params$city)

To create multiple reports in R using the same Quarto document but different values for the parameter, I typically create a function to render my document and then use the purrr package’s walk() function to run my function on a list of items. For example, if my parameterized Quarto document is named params_test.qmd with one parameter named city, this could be my render function in R:

render_my_doc <- function(the_city = "New York", the_doc = "params_test.qmd") {
quarto::quarto_render(input = the_doc, execute_params = list("city" = the_city),
output_file = paste0("test_", the_city, ".docx"), output_format = "docx")

And this is how I’d use my function to generate three separate documents for New York, Chicago, and Los Angeles:

walk(list("New York", "Chicago", "Los Angeles"), render_my_doc)

Python syntax is a bit different and is based on the papermill library. For example, defining a parameter is done in a Python code chunk that would look like

#| tags: [parameters]
city = 'Boston'

You can read more about parameterizing Python documents in the Quarto Parameters documentation

If you’re interested in R and more tips about R, head to the Do More With R page!

Copyright © 2022 IDG Communications, Inc.

Source link

Leave a Reply