View the structure of deeply nested objects or complex function calls with lobstr and listviewer

misc
Author
Published

July 10, 2023

When working with deeply nested structures (either objects or function calls), sometimes it’s helpful to visualize what you’re working with. The lobstr package provides functions for viewing these structures in an easy-to-read way—I find it more intuitive to parse than str().

If you’re using RStudio you can, of course, use the viewer with View(), but that works best for purely rectangular data with no hierarchical structure.

Tree structure for (nested) dataframes

lobstr::tree()

lobstr::tree() provides a tree structure for data structures, and is particularly useful when you have nested structures.

We’ll start with the a subset of the mpg data provided in ggplot2. We nest by cyl, and then call tree(), resulting in a handy representation of the structure (scroll to see full output).

library(tidyverse)
library(lobstr)

mpg_subset <-  mpg |> 
  select(class, cyl, manufacturer, hwy, cty)

mpg_subset |>
  nest(.by = cyl) |> 
  lobstr::tree()
S3<tbl_df/tbl/data.frame>
├─cyl<int [4]>: 4, 6, 8, 5
└─data: <list>
  ├─S3<tbl_df/tbl/data.frame>
  │ ├─class<chr [81]>: "compact", "compact", "compact", "compact", "compact", "compact", "compact", "compact", "midsize", "midsize", ...
  │ ├─manufacturer<chr [81]>: "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "chevrolet", "chevrolet", ...
  │ ├─hwy<int [81]>: 29, 29, 31, 30, 26, 25, 28, 27, 27, 30, ...
  │ └─cty<int [81]>: 18, 21, 20, 21, 18, 16, 20, 19, 19, 22, ...
  ├─S3<tbl_df/tbl/data.frame>
  │ ├─class<chr [79]>: "compact", "compact", "compact", "compact", "compact", "compact", "compact", "midsize", "midsize", "midsize", ...
  │ ├─manufacturer<chr [79]>: "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "chevrolet", ...
  │ ├─hwy<int [79]>: 26, 26, 27, 25, 25, 25, 25, 24, 25, 26, ...
  │ └─cty<int [79]>: 16, 18, 18, 15, 17, 17, 15, 15, 17, 18, ...
  ├─S3<tbl_df/tbl/data.frame>
  │ ├─class<chr [70]>: "midsize", "suv", "suv", "suv", "suv", "suv", "2seater", "2seater", "2seater", "2seater", ...
  │ ├─manufacturer<chr [70]>: "audi", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "chevrolet", ...
  │ ├─hwy<int [70]>: 23, 20, 15, 20, 17, 17, 26, 23, 26, 25, ...
  │ └─cty<int [70]>: 16, 14, 11, 14, 13, 12, 16, 15, 16, 15, ...
  └─S3<tbl_df/tbl/data.frame>
    ├─class<chr [4]>: "compact", "compact", "subcompact", "subcompact"
    ├─manufacturer<chr [4]>: "volkswagen", "volkswagen", "volkswagen", "volkswagen"
    ├─hwy<int [4]>: 29, 29, 28, 29
    └─cty<int [4]>: 21, 21, 20, 20

This works for multiply-nested objects as well—here we nest the mpg dataset by cyl and by class:

mpg_subset |> 
  nest(.by = c(cyl, class)) |> 
  nest(.by = class) |> 
  head(2) |> 
  tree()
S3<tbl_df/tbl/data.frame>
├─class<chr [2]>: "compact", "midsize"
└─data: <list>
  ├─S3<tbl_df/tbl/data.frame>
  │ ├─cyl<int [3]>: 4, 6, 5
  │ └─data: <list>
  │   ├─S3<tbl_df/tbl/data.frame>
  │   │ ├─manufacturer<chr [32]>: "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "nissan", "nissan", ...
  │   │ ├─hwy<int [32]>: 29, 29, 31, 30, 26, 25, 28, 27, 29, 27, ...
  │   │ └─cty<int [32]>: 18, 21, 20, 21, 18, 16, 20, 19, 21, 19, ...
  │   ├─S3<tbl_df/tbl/data.frame>
  │   │ ├─manufacturer<chr [13]>: "audi", "audi", "audi", "audi", "audi", "audi", "audi", "toyota", "toyota", "toyota", ...
  │   │ ├─hwy<int [13]>: 26, 26, 27, 25, 25, 25, 25, 26, 26, 27, ...
  │   │ └─cty<int [13]>: 16, 18, 18, 15, 17, 17, 15, 18, 18, 18, ...
  │   └─S3<tbl_df/tbl/data.frame>
  │     ├─manufacturer<chr [2]>: "volkswagen", "volkswagen"
  │     ├─hwy<int [2]>: 29, 29
  │     └─cty<int [2]>: 21, 21
  └─S3<tbl_df/tbl/data.frame>
    ├─cyl<int [3]>: 6, 8, 4
    └─data: <list>
      ├─S3<tbl_df/tbl/data.frame>
      │ ├─manufacturer<chr [23]>: "audi", "audi", "chevrolet", "chevrolet", "chevrolet", "hyundai", "hyundai", "hyundai", "nissan", "nissan", ...
      │ ├─hwy<int [23]>: 24, 25, 26, 29, 26, 26, 26, 28, 27, 26, ...
      │ └─cty<int [23]>: 15, 17, 18, 18, 17, 18, 18, 19, 19, 19, ...
      ├─S3<tbl_df/tbl/data.frame>
      │ ├─manufacturer<chr [2]>: "audi", "pontiac"
      │ ├─hwy<int [2]>: 23, 25
      │ └─cty<int [2]>: 16, 16
      └─S3<tbl_df/tbl/data.frame>
        ├─manufacturer<chr [16]>: "chevrolet", "chevrolet", "hyundai", "hyundai", "hyundai", "hyundai", "nissan", "nissan", "toyota", "toyota", ...
        ├─hwy<int [16]>: 27, 30, 26, 27, 30, 31, 31, 32, 29, 27, ...
        └─cty<int [16]>: 19, 22, 18, 18, 21, 21, 23, 23, 21, 21, ...

If the output is a bit overwhelming with all of the values being printed, you can specify the val_printer argument—here we give it an empty anonymous function so that no values are printed. This helps to see the structure a little bit more clearly:

mpg_subset |> 
  nest(.by = c(cyl, class)) |> 
  nest(.by = class) |> 
  head(2) |> 
  tree(val_printer = \(x) "" )
S3<tbl_df/tbl/data.frame>
├─class<chr [2]>: 
└─data: <list>
  ├─S3<tbl_df/tbl/data.frame>
  │ ├─cyl<int [3]>: 
  │ └─data: <list>
  │   ├─S3<tbl_df/tbl/data.frame>
  │   │ ├─manufacturer<chr [32]>: 
  │   │ ├─hwy<int [32]>: 
  │   │ └─cty<int [32]>: 
  │   ├─S3<tbl_df/tbl/data.frame>
  │   │ ├─manufacturer<chr [13]>: 
  │   │ ├─hwy<int [13]>: 
  │   │ └─cty<int [13]>: 
  │   └─S3<tbl_df/tbl/data.frame>
  │     ├─manufacturer<chr [2]>: 
  │     ├─hwy<int [2]>: 
  │     └─cty<int [2]>: 
  └─S3<tbl_df/tbl/data.frame>
    ├─cyl<int [3]>: 
    └─data: <list>
      ├─S3<tbl_df/tbl/data.frame>
      │ ├─manufacturer<chr [23]>: 
      │ ├─hwy<int [23]>: 
      │ └─cty<int [23]>: 
      ├─S3<tbl_df/tbl/data.frame>
      │ ├─manufacturer<chr [2]>: 
      │ ├─hwy<int [2]>: 
      │ └─cty<int [2]>: 
      └─S3<tbl_df/tbl/data.frame>
        ├─manufacturer<chr [16]>: 
        ├─hwy<int [16]>: 
        └─cty<int [16]>: 

tree is also handy for list-columns that have complex objects like models:

mpg_subset |> 
  nest(.by = class) |> 
  head(2) |> # Get a subset of the data so output is manageable
  mutate(data = map(data, \(x) lm(cty ~ manufacturer, data = x))) |> 
  tree()
S3<tbl_df/tbl/data.frame>
├─class<chr [2]>: "compact", "midsize"
└─data: <list>
  ├─S3<lm>
  │ ├─coefficients<dbl [5]>: 17.9333333333333, 2.06666666666666, 1.81666666666667, 4.31666666666667, 2.85238095238095
  │ ├─residuals<dbl [47]>: 0.066666666666665, 3.06666666666667, 2.06666666666666, 3.06666666666667, -1.93333333333334, 0.0666666666666665, 0.0666666666666665, 0.0666666666666665, -1.93333333333333, 2.06666666666667, ...
  │ ├─effects<dbl [47]>: -137.988281957008, -0.184506241605778, -0.803194840135016, 8.56434361157096, 7.67570324864425, -0.0407824196725389, -0.0407824196725389, -0.0407824196725389, -2.04078241967254, 1.95921758032746, ...
  │ ├─rank: 5
  │ ├─fitted.values<dbl [47]>: 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, ...
  │ ├─assign<int [5]>: 0, 1, 1, 1, 1
  │ ├─qr: S3<qr>
  │ │ ├─qr<dbl [235]>: -6.85565460040104, 0.145864991497895, 0.145864991497895, 0.145864991497895, 0.145864991497895, 0.145864991497895, 0.145864991497895, 0.145864991497895, 0.145864991497895, 0.145864991497895, ...
  │ │ ├─qraux<dbl [5]>: 1.14586499149789, 1.02683653004132, 1.03957282449068, 1.08213154946517, 1.13553443533542
  │ │ ├─pivot<int [5]>: 1, 2, 3, 4, 5
  │ │ ├─tol: 1e-07
  │ │ └─rank: 5
  │ ├─df.residual: 42
  │ ├─contrasts: <list>
  │ │ └─manufacturer: "contr.treatment"
  │ ├─xlevels: <list>
  │ │ └─manufacturer<chr [5]>: "audi", "nissan", "subaru", "toyota", "volkswagen"
  │ ├─call: <language> lm(formula = cty ~ manufacturer, data = x)
  │ ├─terms: S3<terms/formula> cty ~ manufacturer
  │ └─model: S3<data.frame>
  │   ├─cty<int [47]>: 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, ...
  │   └─manufacturer<chr [47]>: "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", ...
  └─S3<lm>
    ├─coefficients<dbl [7]>: 16, 2.80000000000001, 3.00000000000001, 4.00000000000001, 1.00000000000001, 3.85714285714287, 2.57142857142858
    ├─residuals<dbl [41]>: -1.00000000000003, 1.00000000000001, 1.05208052500074e-14, 0.200000000000008, 3.2, -0.800000000000004, -0.8, -1.8, -1, -1, ...
    ├─effects<dbl [41]>: -120.097622892338, 0.10476454436544, 0.736955526660781, 3.98035930389303, -3.23748103044015, 4.17435145669764, 3.72635402044873, -1.27026155514514, -0.923635942390468, -0.923635942390468, ...
    ├─rank: 7
    ├─fitted.values<dbl [41]>: 16, 16, 16, 18.8, 18.8, 18.8, 18.8, 18.8, 19, 19, ...
    ├─assign<int [7]>: 0, 1, 1, 1, 1, 1, 1
    ├─qr: S3<qr>
    │ ├─qr<dbl [287]>: -6.40312423743285, 0.156173761888606, 0.156173761888606, 0.156173761888606, 0.156173761888606, 0.156173761888606, 0.156173761888606, 0.156173761888606, 0.156173761888606, 0.156173761888606, ...
    │ ├─qraux<dbl [7]>: 1.15617376188861, 1.0503406378574, 1.06742880959931, 1.02124277862812, 1.02296110820256, 1.03939209519442, 1.09021806425183
    │ ├─pivot<int [7]>: 1, 2, 3, 4, 5, 6, 7
    │ ├─tol: 1e-07
    │ └─rank: 7
    ├─df.residual: 34
    ├─contrasts: <list>
    │ └─manufacturer: "contr.treatment"
    ├─xlevels: <list>
    │ └─manufacturer<chr [7]>: "audi", "chevrolet", "hyundai", "nissan", "pontiac", "toyota", "volkswagen"
    ├─call: <language> lm(formula = cty ~ manufacturer, data = x)
    ├─terms: S3<terms/formula> cty ~ manufacturer
    └─model: S3<data.frame>
      ├─cty<int [41]>: 15, 17, 16, 19, 22, 18, 18, 17, 18, 18, ...
      └─manufacturer<chr [41]>: "audi", "audi", "audi", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "hyundai", "hyundai", ...

listviewer

If you’re looking for a more GUI-type visualization of nested dfs, consider listviewer::reactjson():

mpg_subset |> 
  nest(.by = c(cyl, class)) |> 
  nest(.by = class, .key = "class_data") |> 
  listviewer::reactjson(collapsed = 4) # collapse after 4 levels deep


reactjson gives essentially the same view as tree: each column of the df is its own node. You can, though, get a more row-wise view which I sometimes find more intuitive by first converting to JSON—here each level of class is its own node:

mpg_subset |> 
  nest(.by = c(cyl, class)) |> 
  nest(.by = class, .key = "class_data") |> 
  jsonlite::toJSON() |> 
  listviewer::reactjson(collapsed = 4) # collapse after 4 levels deep


The listviewer package also provides an alternative JSON viewer, jsonedit():

mpg_subset |> 
  nest(.by = c(cyl, class)) |> 
  nest(.by = class, .key = "class_data") |> 
  jsonlite::toJSON() |> # convert df to JSON
  listviewer::jsonedit()


(A quick note: toJSON() only converts a limited selection of objects; it can’t, for example, convert an lm model to visualize like we did with the midwest_data data above.)

Abstract syntax trees for complex function calls

lobstr also provides the function ast() which gives a visual representation of complex function calls, letting you see the order in which the functions are being evaluated. Here we see that our unnamed function calls + on x and y.

ast(function(x = 1, y = 2) { x + y } )
█─`function` 
├─█─x = 1 
│ └─y = 2 
├─█─`{` 
│ └─█─`+` 
│   ├─x 
│   └─y 
└─<inline srcref> 

This is particularly useful when using pipes, where the linear order of the code is actually the reverse of what it would be without piping. Here, for example, we can see that mutate is applying to a grouped df, which itself was filtered.

mpg |> 
  filter(manufacturer == "honda") |> 
  group_by(cyl) |> 
  mutate(mean_hwy = mean(hwy)) |> 
  ast()
█─mutate 
├─█─group_by 
│ ├─█─filter 
│ │ ├─mpg 
│ │ └─█─`==` 
│ │   ├─manufacturer 
│ │   └─"honda" 
│ └─cyl 
└─mean_hwy = █─mean 
             └─hwy