View the structure of deeply nested objects or complex function calls with lobstr
and listviewer
When working with deeply nested structures (either objects or function calls), sometimes it’s helpful to visualize what you’re working with. The lobstr
package provides functions for viewing these structures in an easy-to-read way—I find it more intuitive to parse than str()
.
If you’re using RStudio you can, of course, use the viewer with View()
, but that works best for purely rectangular data with no hierarchical structure.
Tree structure for (nested) dataframes
lobstr::tree()
lobstr::tree()
provides a tree structure for data structures, and is particularly useful when you have nested structures.
We’ll start with the a subset of the mpg
data provided in ggplot2
. We nest by cyl
, and then call tree()
, resulting in a handy representation of the structure (scroll to see full output).
library(tidyverse)
library(lobstr)
mpg_subset <- mpg |>
select(class, cyl, manufacturer, hwy, cty)
mpg_subset |>
nest(.by = cyl) |>
lobstr::tree()
S3<tbl_df/tbl/data.frame>
├─cyl<int [4]>: 4, 6, 8, 5
└─data: <list>
├─S3<tbl_df/tbl/data.frame>
│ ├─class<chr [81]>: "compact", "compact", "compact", "compact", "compact", "compact", "compact", "compact", "midsize", "midsize", ...
│ ├─manufacturer<chr [81]>: "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "chevrolet", "chevrolet", ...
│ ├─hwy<int [81]>: 29, 29, 31, 30, 26, 25, 28, 27, 27, 30, ...
│ └─cty<int [81]>: 18, 21, 20, 21, 18, 16, 20, 19, 19, 22, ...
├─S3<tbl_df/tbl/data.frame>
│ ├─class<chr [79]>: "compact", "compact", "compact", "compact", "compact", "compact", "compact", "midsize", "midsize", "midsize", ...
│ ├─manufacturer<chr [79]>: "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "chevrolet", ...
│ ├─hwy<int [79]>: 26, 26, 27, 25, 25, 25, 25, 24, 25, 26, ...
│ └─cty<int [79]>: 16, 18, 18, 15, 17, 17, 15, 15, 17, 18, ...
├─S3<tbl_df/tbl/data.frame>
│ ├─class<chr [70]>: "midsize", "suv", "suv", "suv", "suv", "suv", "2seater", "2seater", "2seater", "2seater", ...
│ ├─manufacturer<chr [70]>: "audi", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "chevrolet", ...
│ ├─hwy<int [70]>: 23, 20, 15, 20, 17, 17, 26, 23, 26, 25, ...
│ └─cty<int [70]>: 16, 14, 11, 14, 13, 12, 16, 15, 16, 15, ...
└─S3<tbl_df/tbl/data.frame>
├─class<chr [4]>: "compact", "compact", "subcompact", "subcompact"
├─manufacturer<chr [4]>: "volkswagen", "volkswagen", "volkswagen", "volkswagen"
├─hwy<int [4]>: 29, 29, 28, 29
└─cty<int [4]>: 21, 21, 20, 20
This works for multiply-nested objects as well—here we nest the mpg
dataset by cyl
and by class
:
S3<tbl_df/tbl/data.frame>
├─class<chr [2]>: "compact", "midsize"
└─data: <list>
├─S3<tbl_df/tbl/data.frame>
│ ├─cyl<int [3]>: 4, 6, 5
│ └─data: <list>
│ ├─S3<tbl_df/tbl/data.frame>
│ │ ├─manufacturer<chr [32]>: "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "nissan", "nissan", ...
│ │ ├─hwy<int [32]>: 29, 29, 31, 30, 26, 25, 28, 27, 29, 27, ...
│ │ └─cty<int [32]>: 18, 21, 20, 21, 18, 16, 20, 19, 21, 19, ...
│ ├─S3<tbl_df/tbl/data.frame>
│ │ ├─manufacturer<chr [13]>: "audi", "audi", "audi", "audi", "audi", "audi", "audi", "toyota", "toyota", "toyota", ...
│ │ ├─hwy<int [13]>: 26, 26, 27, 25, 25, 25, 25, 26, 26, 27, ...
│ │ └─cty<int [13]>: 16, 18, 18, 15, 17, 17, 15, 18, 18, 18, ...
│ └─S3<tbl_df/tbl/data.frame>
│ ├─manufacturer<chr [2]>: "volkswagen", "volkswagen"
│ ├─hwy<int [2]>: 29, 29
│ └─cty<int [2]>: 21, 21
└─S3<tbl_df/tbl/data.frame>
├─cyl<int [3]>: 6, 8, 4
└─data: <list>
├─S3<tbl_df/tbl/data.frame>
│ ├─manufacturer<chr [23]>: "audi", "audi", "chevrolet", "chevrolet", "chevrolet", "hyundai", "hyundai", "hyundai", "nissan", "nissan", ...
│ ├─hwy<int [23]>: 24, 25, 26, 29, 26, 26, 26, 28, 27, 26, ...
│ └─cty<int [23]>: 15, 17, 18, 18, 17, 18, 18, 19, 19, 19, ...
├─S3<tbl_df/tbl/data.frame>
│ ├─manufacturer<chr [2]>: "audi", "pontiac"
│ ├─hwy<int [2]>: 23, 25
│ └─cty<int [2]>: 16, 16
└─S3<tbl_df/tbl/data.frame>
├─manufacturer<chr [16]>: "chevrolet", "chevrolet", "hyundai", "hyundai", "hyundai", "hyundai", "nissan", "nissan", "toyota", "toyota", ...
├─hwy<int [16]>: 27, 30, 26, 27, 30, 31, 31, 32, 29, 27, ...
└─cty<int [16]>: 19, 22, 18, 18, 21, 21, 23, 23, 21, 21, ...
If the output is a bit overwhelming with all of the values being printed, you can specify the val_printer
argument—here we give it an empty anonymous function so that no values are printed. This helps to see the structure a little bit more clearly:
mpg_subset |>
nest(.by = c(cyl, class)) |>
nest(.by = class) |>
head(2) |>
tree(val_printer = \(x) "" )
S3<tbl_df/tbl/data.frame>
├─class<chr [2]>:
└─data: <list>
├─S3<tbl_df/tbl/data.frame>
│ ├─cyl<int [3]>:
│ └─data: <list>
│ ├─S3<tbl_df/tbl/data.frame>
│ │ ├─manufacturer<chr [32]>:
│ │ ├─hwy<int [32]>:
│ │ └─cty<int [32]>:
│ ├─S3<tbl_df/tbl/data.frame>
│ │ ├─manufacturer<chr [13]>:
│ │ ├─hwy<int [13]>:
│ │ └─cty<int [13]>:
│ └─S3<tbl_df/tbl/data.frame>
│ ├─manufacturer<chr [2]>:
│ ├─hwy<int [2]>:
│ └─cty<int [2]>:
└─S3<tbl_df/tbl/data.frame>
├─cyl<int [3]>:
└─data: <list>
├─S3<tbl_df/tbl/data.frame>
│ ├─manufacturer<chr [23]>:
│ ├─hwy<int [23]>:
│ └─cty<int [23]>:
├─S3<tbl_df/tbl/data.frame>
│ ├─manufacturer<chr [2]>:
│ ├─hwy<int [2]>:
│ └─cty<int [2]>:
└─S3<tbl_df/tbl/data.frame>
├─manufacturer<chr [16]>:
├─hwy<int [16]>:
└─cty<int [16]>:
tree
is also handy for list-columns that have complex objects like models:
mpg_subset |>
nest(.by = class) |>
head(2) |> # Get a subset of the data so output is manageable
mutate(data = map(data, \(x) lm(cty ~ manufacturer, data = x))) |>
tree()
S3<tbl_df/tbl/data.frame>
├─class<chr [2]>: "compact", "midsize"
└─data: <list>
├─S3<lm>
│ ├─coefficients<dbl [5]>: 17.9333333333333, 2.06666666666666, 1.81666666666667, 4.31666666666667, 2.85238095238095
│ ├─residuals<dbl [47]>: 0.066666666666665, 3.06666666666667, 2.06666666666666, 3.06666666666667, -1.93333333333334, 0.0666666666666665, 0.0666666666666665, 0.0666666666666665, -1.93333333333333, 2.06666666666667, ...
│ ├─effects<dbl [47]>: -137.988281957008, -0.184506241605778, -0.803194840135016, 8.56434361157096, 7.67570324864425, -0.0407824196725389, -0.0407824196725389, -0.0407824196725389, -2.04078241967254, 1.95921758032746, ...
│ ├─rank: 5
│ ├─fitted.values<dbl [47]>: 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, 17.9333333333333, ...
│ ├─assign<int [5]>: 0, 1, 1, 1, 1
│ ├─qr: S3<qr>
│ │ ├─qr<dbl [235]>: -6.85565460040104, 0.145864991497895, 0.145864991497895, 0.145864991497895, 0.145864991497895, 0.145864991497895, 0.145864991497895, 0.145864991497895, 0.145864991497895, 0.145864991497895, ...
│ │ ├─qraux<dbl [5]>: 1.14586499149789, 1.02683653004132, 1.03957282449068, 1.08213154946517, 1.13553443533542
│ │ ├─pivot<int [5]>: 1, 2, 3, 4, 5
│ │ ├─tol: 1e-07
│ │ └─rank: 5
│ ├─df.residual: 42
│ ├─contrasts: <list>
│ │ └─manufacturer: "contr.treatment"
│ ├─xlevels: <list>
│ │ └─manufacturer<chr [5]>: "audi", "nissan", "subaru", "toyota", "volkswagen"
│ ├─call: <language> lm(formula = cty ~ manufacturer, data = x)
│ ├─terms: S3<terms/formula> cty ~ manufacturer
│ └─model: S3<data.frame>
│ ├─cty<int [47]>: 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, ...
│ └─manufacturer<chr [47]>: "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", "audi", ...
└─S3<lm>
├─coefficients<dbl [7]>: 16, 2.80000000000001, 3.00000000000001, 4.00000000000001, 1.00000000000001, 3.85714285714287, 2.57142857142858
├─residuals<dbl [41]>: -1.00000000000003, 1.00000000000001, 1.05208052500074e-14, 0.200000000000008, 3.2, -0.800000000000004, -0.8, -1.8, -1, -1, ...
├─effects<dbl [41]>: -120.097622892338, 0.10476454436544, 0.736955526660781, 3.98035930389303, -3.23748103044015, 4.17435145669764, 3.72635402044873, -1.27026155514514, -0.923635942390468, -0.923635942390468, ...
├─rank: 7
├─fitted.values<dbl [41]>: 16, 16, 16, 18.8, 18.8, 18.8, 18.8, 18.8, 19, 19, ...
├─assign<int [7]>: 0, 1, 1, 1, 1, 1, 1
├─qr: S3<qr>
│ ├─qr<dbl [287]>: -6.40312423743285, 0.156173761888606, 0.156173761888606, 0.156173761888606, 0.156173761888606, 0.156173761888606, 0.156173761888606, 0.156173761888606, 0.156173761888606, 0.156173761888606, ...
│ ├─qraux<dbl [7]>: 1.15617376188861, 1.0503406378574, 1.06742880959931, 1.02124277862812, 1.02296110820256, 1.03939209519442, 1.09021806425183
│ ├─pivot<int [7]>: 1, 2, 3, 4, 5, 6, 7
│ ├─tol: 1e-07
│ └─rank: 7
├─df.residual: 34
├─contrasts: <list>
│ └─manufacturer: "contr.treatment"
├─xlevels: <list>
│ └─manufacturer<chr [7]>: "audi", "chevrolet", "hyundai", "nissan", "pontiac", "toyota", "volkswagen"
├─call: <language> lm(formula = cty ~ manufacturer, data = x)
├─terms: S3<terms/formula> cty ~ manufacturer
└─model: S3<data.frame>
├─cty<int [41]>: 15, 17, 16, 19, 22, 18, 18, 17, 18, 18, ...
└─manufacturer<chr [41]>: "audi", "audi", "audi", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "chevrolet", "hyundai", "hyundai", ...
listviewer
If you’re looking for a more GUI-type visualization of nested df
s, consider listviewer::reactjson()
:
mpg_subset |>
nest(.by = c(cyl, class)) |>
nest(.by = class, .key = "class_data") |>
listviewer::reactjson(collapsed = 4) # collapse after 4 levels deep
reactjson
gives essentially the same view as tree
: each column of the df
is its own node. You can, though, get a more row-wise view which I sometimes find more intuitive by first converting to JSON—here each level of class
is its own node:
mpg_subset |>
nest(.by = c(cyl, class)) |>
nest(.by = class, .key = "class_data") |>
jsonlite::toJSON() |>
listviewer::reactjson(collapsed = 4) # collapse after 4 levels deep
The listviewer
package also provides an alternative JSON viewer, jsonedit()
:
(A quick note: toJSON()
only converts a limited selection of objects; it can’t, for example, convert an lm
model to visualize like we did with the midwest_data
data above.)
Abstract syntax trees for complex function calls
lobstr
also provides the function ast()
which gives a visual representation of complex function calls, letting you see the order in which the functions are being evaluated. Here we see that our unnamed function calls +
on x
and y
.
ast(function(x = 1, y = 2) { x + y } )
█─`function`
├─█─x = 1
│ └─y = 2
├─█─`{`
│ └─█─`+`
│ ├─x
│ └─y
└─<inline srcref>
This is particularly useful when using pipes, where the linear order of the code is actually the reverse of what it would be without piping. Here, for example, we can see that mutate
is applying to a group
ed df, which itself was filter
ed.