Sort numbered strings (or factor levels, or filenames) correctly even without zero-padding
It’s quite common to see lists of items (data, files, etc) that are numbered, such as the hypothetical list of files below:
filenames <- c("file2.csv", "file1.csv", "file3.csv",
"file11.csv", "file10.csv", "file20.csv")
print(filenames)
[1] "file2.csv" "file1.csv" "file3.csv" "file11.csv" "file10.csv"
[6] "file20.csv"
If you want to sort
these by number, you run into a problem, since the filenames are strings: 1
is followed by 10
, which is followed by 2
, since 10
precedes 2
“alphabetically”:
filenames |> sort()
[1] "file1.csv" "file10.csv" "file11.csv" "file2.csv" "file20.csv"
[6] "file3.csv"
One solution is to rename your items such that they are zero-padded. A kludge with stringr
’s str_replace()
andstr_pad()
can get the job done. Because of the leading zeros,sort()
will get the result you expect:
padded <- str_replace(filenames, "[0-9]+", \(x) str_pad(x, 2, pad="0"))
padded |> sort()
[1] "file01.csv" "file02.csv" "file03.csv" "file10.csv" "file11.csv"
[6] "file20.csv"
Rather than renaming your items, naturalsort::naturalsort()
orders your items in “human natural” order:
filenames |> naturalsort::naturalsort()
[1] "file1.csv" "file2.csv" "file3.csv" "file10.csv" "file11.csv"
[6] "file20.csv"
The naturalsort
package also comes with the command naturalfactor()
, which can reorder a factor in the same way, or turn an unordered list of strings into a factor:
my_factor <- factor(c("level_1", "level_10", "level_2"))
naturalsort::naturalfactor(my_factor)
[1] level_1 level_10 level_2
Levels: level_1 < level_2 < level_10
c("level1", "level10", "level2") |> naturalfactor()
[1] level1 level10 level2
Levels: level1 < level2 < level10