Introduction

The SewerRat package implements an R client for the API of the same name. This allows users to easily register or deregister their own directories in the index for subsequent queries. Readers are referred to SewerRat documentation for a description of the concepts; this guide will strictly focus on the usage of the SewerRat package. For demonstration purposes, we’ll set up a test instance of the API on our local machine:

library(SewerRat)
info <- startSewerRat()
info$url
## [1] "http://0.0.0.0:5114"

Registering directories

We assume that the users of the SewerRat package and the SewerRat API itself are both on the same shared filesystem. The idea is that a user can instruct SewerRat to incorporate a particular directory into the search index, provided that the to-be-registered directory is world-readable and that the caller has write access to it. To illustrate, let’s mock up a directory of metadata files:

mydir <- tempfile()
dir.create(mydir)
write(file=file.path(mydir, "metadata.json"), '{ "name": "foo", "description": "bar" }')
dir.create(file.path(mydir, "stuff"))
write(file=file.path(mydir, "stuff", "metadata.json"), '{ "food": "barramundi" }')

We can then easily register it with the register() function. The example below will only index metadata files inside mydir named metadata.json, though any number of base names can be supplied here. We can also instruct SewerRat to skip indexing of a subdirectory within mydir by creating a .SewerRatignore file inside that subdirectory.

library(SewerRat)
register(mydir, names="metadata.json", url=info$url) 

Similarly, we can deregister this directory with deregister().

Searching the index

We use the query() function to perform free-text searches on the indexed metadata. This function does not require filesystem access and can be executed remotely.

lapply(query("foo", url=info$url), function(x) x$path)
## [[1]]
## [1] "/tmp/RtmpEIwdSd/file2e0350210f5/metadata.json"
lapply(query("bar*", url=info$url), function(x) x$path) # partial match to 'bar...'
## [[1]]
## [1] "/tmp/RtmpEIwdSd/file2e0350210f5/stuff/metadata.json"
## 
## [[2]]
## [1] "/tmp/RtmpEIwdSd/file2e0350210f5/metadata.json"
lapply(query("bar* AND foo", url=info$url), function(x) x$path) # boolean operations
## [[1]]
## [1] "/tmp/RtmpEIwdSd/file2e0350210f5/metadata.json"
lapply(query("food:bar*", url=info$url), function(x) x$path) # match in the 'food' field
## [[1]]
## [1] "/tmp/RtmpEIwdSd/file2e0350210f5/stuff/metadata.json"

We can also search on the user, path components, and time of creation:

lapply(query(user=Sys.info()["user"], url=info$url), function(x) x$path) # created by myself
## [[1]]
## [1] "/tmp/RtmpEIwdSd/file2e0350210f5/stuff/metadata.json"
## 
## [[2]]
## [1] "/tmp/RtmpEIwdSd/file2e0350210f5/metadata.json"
lapply(query(path="stuff", url=info$url), function(x) x$path) # path has 'stuff' in it
## [[1]]
## [1] "/tmp/RtmpEIwdSd/file2e0350210f5/stuff/metadata.json"
lapply(query(from=Sys.time() - 3600, url=info$url), function(x) x$path) # created less than 1 hour ago
## [[1]]
## [1] "/tmp/RtmpEIwdSd/file2e0350210f5/stuff/metadata.json"
## 
## [[2]]
## [1] "/tmp/RtmpEIwdSd/file2e0350210f5/metadata.json"

Or indeed, any combination of these fields, which are treated as a boolean AND in the query:

query("barramundi", path="stuff", url=info$url)
## [[1]]
## [[1]]$path
## [1] "/tmp/RtmpEIwdSd/file2e0350210f5/stuff/metadata.json"
## 
## [[1]]$user
## [1] "root"
## 
## [[1]]$time
## [1] 1732572197
## 
## [[1]]$metadata
## [[1]]$metadata$food
## [1] "barramundi"

Retrieving files

Typically, users of the SewerRat package are expected to be on the same filesystem as the SewerRat API, so the contents of the files/directories can be inspected using standard methods. However, SewerRat also supports remote access for listing of files in registered directories:

listFiles(mydir, url=info$url)
## [1] "metadata.json"       "stuff/metadata.json"

As well as retrieval of file contents in any registered directory. Note that this can be applied to any file in the directory, not just the metadata files that were registered.

readLines(retrieveFile(file.path(mydir, "metadata.json"), url=info$url))
## [1] "{ \"name\": \"foo\", \"description\": \"bar\" }"
# Or we can just retrieve the entire directory:
dirpath <- retrieveDirectory(file.path(mydir), url=info$url)
readLines(file.path(dirpath, "stuff", "metadata.json"))
## [1] "{ \"food\": \"barramundi\" }"

We can also directly retrieve individual metadata files, with extra details about the owner and modification time:

retrieveMetadata(file.path(mydir, "metadata.json"), url=info$url)
## $path
## [1] "/tmp/RtmpEIwdSd/file2e0350210f5/metadata.json"
## 
## $user
## [1] "root"
## 
## $time
## [1] 1732572197
## 
## $metadata
## $metadata$name
## [1] "foo"
## 
## $metadata$description
## [1] "bar"

Session information

## R Under development (unstable) (2024-11-20 r87352)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] SewerRat_0.3.2   BiocStyle_2.35.0
## 
## loaded via a namespace (and not attached):
##  [1] cli_3.6.3           knitr_1.49          rlang_1.1.4        
##  [4] xfun_0.49           textshaping_0.4.0   jsonlite_1.8.9     
##  [7] glue_1.8.0          htmltools_0.5.8.1   ragg_1.3.3         
## [10] sass_0.4.9          rappdirs_0.3.3      rmarkdown_2.29     
## [13] evaluate_1.0.1      jquerylib_0.1.4     fastmap_1.2.0      
## [16] yaml_2.3.10         lifecycle_1.0.4     httr2_1.0.6        
## [19] bookdown_0.41       BiocManager_1.30.25 compiler_4.5.0     
## [22] fs_1.6.5            htmlwidgets_1.6.4   systemfonts_1.1.0  
## [25] digest_0.6.37       R6_2.5.1            curl_6.0.1         
## [28] magrittr_2.0.3      bslib_0.8.0         tools_4.5.0        
## [31] pkgdown_2.1.1       cachem_1.1.0        desc_1.4.3