6 useful R functions you might not know

ColorBrewer palettes
Credit: Screenshot of the tmaptools R package's palette_explorer function

Almost every R user knows about popular packages like dplyr and ggplot2. But with 10,000+ packages on CRAN and yet more on GitHub, it's not always easy to unearth libraries with great R functions. One of the best way to find cool, new-to-you R code is to see what other useRs have discovered. So, I'm sharing a few of my discoveries -- and hope you'll share some of yours in return (contact info below).

Choose a ColorBrewer palette from an interactive app. Need a color scheme for a map or app? ColorBrewer is well known as a source for pre-configured palettes, and the RColorBrewer package imports those into R. But it's not always easy to remember what's available. The tmaptools package's palette_explorer creates an interactive application that shows you the possibilities.

First, install tmaptools with install.packages("tmaptools"), then load tmaptools with library("tmaptools") and run palette_explorer() (or, don't load tmaptools and run tmaptools::palette_explorer() ). You'll see all available palettes as in the image above, as well as sliders to adjust options like number of colors. There's also info about basic syntax for using a color scheme below each group of palettes.

palette_explorer also needs shiny and shinyjs packages installed in order to generate the interactive app.

Create character vectors without quotation marks. It can be a bit annoying to manually turn Firefox, Chrome, Edge, Safari, Internet Explorer, Opera into the c("Firefox", "Chrome", "Edge", "Safari", "InternetExplorer", "Opera") format R needs to use such text as a vector of character strings.

That's what the Hmisc package's Cs function was designed to do. After loading the Hmisc package,

Cs(Firefox, Chrome, Edge, Safari, InternetExplorer, Opera)

will evaluate the same as

c("Firefox", "Chrome", "Edge", "Safari", "InternetExplorer", "Opera")

If you've ever manually added quotation marks to a lengthy string of words, you'll appreciate the elegance. Note the lack of a space in Internet Explorer -- spaces will trip up the Cs function.

RStudio bonus: If you use RStudio, there's another option for sleek vector-string creation. Security pro Bob Rudis created an RStudio add-in that takes selected comma-separated text and adds the necessary quotes and c(). And it can handle spaces. Install it with devtools::install_github("hrbrmstr/hrbraddins") (which means you need the devtools package as well), and you'll see Bare Combine as an option in the RStudio Tools > Addins menu.

You can run it from that Addins menu, but selecting text and then leaving your coding window to go to the Tools > Addins menu to select Bare Combine doesn't necessarily feel less cumbersome than typing a few quotation marks. Much better to create a custom keyboard shortcut for the addin.

You can do that by going to Tools > Modify Keyboard Shortcuts. Scroll down until you see Bare Combine in the Addins section -- or search for Bare Combine in the filter box. Double click in the shortcut area and type the keystroke(s) you want to assign to the addin (I used alt-shift-').

Customizing keyboard shortcuts in RStudio Screenshot of RStudio software

Customizing keyboard shortcuts in RStudio

Now, any time you want to turn comma-separated plain text into an R vector of character strings, you can highlight the text and use your keyboard shortcuts.

By the way, RStudio add-ins are mostly just plain R. If you'd like having keyboard shortcuts for R tasks like this, it might be worth learning the syntax.

Produce an interactive table with one line of code. Regardless of how much you like and use the command line, sometimes it's still nice to look at a spreadsheet-like table of data to scan, sort and filter. RStudio provided a basic view like this; but for large data sets, I like RStudio's DT package, a wrapper for the DataTables JavaScript library. DT::datatable(mydf) creates an interactive HTML table; DT::datatable(mydf, filter = "top") adds a filter box above each row.

HTML table created with R Screen shot of an HTML table created with the R DT package

Example of an HTML table created with the R DT package, an interface to the DataTables JavaScript library.

Easy file conversions. rio is one of my favorite R packages. Instead of remembering which functions to use for importing what types of files (read.csv? read.table? read_excel?), rio vastly simplifies the process with one import function for a couple of dozen file formats. As long as the file extension is a format that rio recognizes, it will appropriately import from files such as .csv, .json, .xlsx and .html (tables). Same for rio's export command if you'd like to save to a particular file format. But rio has a third major function: convert, which will import and export in a single step. Have a million-row Excel file you need to save as a CSV? An HTML table you'd like to save as JSON? Use a syntax like convert("myfile.xlsx", "myfile.csv"), where the first argument is your existing file and the second is your desired file with the desired extension, and your file will be created.

Copy and paste from R to your clipboard. rio bonus: You can copy between your clipboard and R with rio. Send some data from a small R variable to your clipboard with export(myRobject, "clipboard"). Importing to the clipboard should work as well, although I've had mixed success with that.

Import large files quickly - and perhaps save space. I'm working on a project this week that involves a spreadsheet with more than 600,000 rows and 40 columns. Reading it into R took around 25 to 30 seconds -- doable once, but annoying when I had to do it multiple times. The feather binary file format is not only readable by both R and Python, but is considerably faster to read and write. rio handles feather files, or you can use read_feather from the feather package.

For saving space as well as speed, the fst package looks to be an excellent choice because it offers compression. In my testing, write.fst(mydf, "myfile.fst", 100) -- maximum compression -- was just as speedy as no compression, and it took about one-third the space of the original spreadsheet. feather, meanwhile, took up almost double the spreadsheet disk space.

A few additional favorites from readers and social media:

More with quotes. In response to the Cs() function that adds quotes, Kwan Lowe touted the usefulness of noquote(), which strips quotes -- useful for importing certain types of data into R. noquote() is a base R function, aimed it making it easier to wrangle variables.

Un-factoring factors. Another useful function: unfactor() in the varhandle package, which aims to detect the "real" class of an R data frame column of factors and then turn it into either numeric or character variables.

table() alternative. Need to calculate frequencies of variables in a data frame? "I'm a huge fan of xtabs()," Timothy Teravainen posted at Google+ in response to this blog. "It's in base R, but I sadly went years without knowing about it."

The format is xtabs(~df$col1 + df$col2), which will return a frequency table with col1 as the rows and col2 as the columns.

Text searching. If you've been using regular expressions to search for text that starts or ends with a certain character string, there's an easier way. "startsWith() and endsWith() -- did I really not know these?" tweeted data scientist Jonathan Carroll. "That's it, I'm sitting down and reading through dox for every #rstats function."

Loading packages -- and auto-installing if they're not present. For reproducible research, an R script can't simply load external packages -- it's got to check whether those packages are loaded on the user's machine and install them if they're not. There are several ways to do this in base R, such as using require() to check if various packages load and then installing the packages if they're not. The pacman package simplifies this immensely. To load packages and install them from CRAN if not available, the syntax is: p_load("package1", "package2", "package3"). There's also a p_load_gh() version for packages on GitHub. Thanks to Twitter user @Himmie_He for the tip.

Want to share your own favorites? Tell me via Twitter @sharon000 or email at smachlis@computerworld.com.

For more on useful R functions, see Great R packages for data import, wrangling and visualization.

Computerworld's IT Salary Survey 2017 results
Shop Tech Products at Amazon