Comments-from: Chengyu Han, Jānis Erdmanis , Kevin Bonham, Neven Sajko, Miles Cranmer, [your name here]

1. History

Julia's approach to file paths was largely inspired by Python …just before Pathlib was adopted. In the years since, the idea that a path type would benefit Julia has been articulated multiple times, in different ways.

  • In 2013 the path methods we know today were introduced.
  • In 2014, @stevengj made an issue proposing a mildly cursed partial workaround for the lack of a path type.
  • In 2016, FilePathsBase.jl was started.
  • In 2017 (just before Julia 1.0), Frames wrote a Julep advocating for a path type, but unfortunately it didn't go anywhere before Julia 1.0 was out.
  • In 2018 this was incidentally mentioned in a discourse topic.
  • In 2020, a newcomer from CommonLisp opened a discourse topic about missing a path type
  • In 2020, an issue was opened in the main Julia repo on this.
  • In 2021 Jakob wrote a post that stuck in my mind examining flaws in Julia, including the lack of a path type in the language.
  • In 2022, ExpandingMan starts working on FilePaths2.jl.
  • In 2024, I got fed up enough with this after the latest paper cut I experienced to write a gripe on Slack.

Julia's Slack only keeps 90 days of conversation history, but you can usually search for "path type" and find somebody running into paper cuts/headaches. Ignoring my recent gripe, doing this I see Mosé in response to some platform-specific handling that came up on a Julia PR addressing the difference a trailing slash makes with some depot operations.

We should really have a proper path type, strings are simple bad for manipulating them 💯 ×2

The use of strings as paths also precludes (or at least complicates) support for filesystems beyond that of the current operating system. Virtual filesystems are increasingly recognised as a useful abstraction in programming languages, not just operating systems (see: io/fs in Go for example), but very few languages provide built-in support (Java is a notable exception with NIO2).

2. Motivation

While C and friends use char-vector types for strings, paths, and more, most modern high-level languages have settled on a dedicated path type, and for good reason. While familiar, this approach conflates several distinct concepts: textual data, namespace locations, and filesystem resources. This places a significant burden on users and library authors to reason about correctness, safety, and platform-specific behaviour.

Users with experience with a modern language (such as Rust, Python, Java/Kotlin/Scala, Swift, or C++17) will be familiar with the value of a dedicated path type, and the importance of it being part of the base language/stdlib. Julia already has dedicated non-String types for regular expressions, substitution strings, and more. Filesystem paths merit the same treatment.

A first-class path type offers several concrete benefits:

  • Resolving the ambiguity of representing data as a string directly vs. a path to the data
  • Allowing for dispatch on text content/paths, and more generic functions
  • A platform-independent syntax and methods for working with paths
  • Less footguns/paper cuts, from reduced ambiguity and more rigorous handling
  • Support for virtual filesystems

This path type must exist in Julia's base, for two primary reasons:

  • Base itself makes extensive use of paths
  • A 3rd party library cannot provide the same level of ecosystem-wide coherence and consistency

Experience from other languages also reveals that paths should be considered together with the context they operate within. The filesystem uses paths as means to resolve a reference to a resource, and paying careful attention to what this means allows us to generalise filesystem interaction to support capability-based access and virtual file systems.

3. Terminology

While the terms used to describe files, paths, and other filesystem-related entities and operations are standard, when splitting hairs (as this proposal does), it is important to be precise with our language. To that end, here are the specific concepts we will be interrogating:

  • A path is a description of a location within a namespace. It is a structured value composed of segments, and its meaning is defined only relative to a particular filesystem and (in general) a point of reference within that filesystem. A path does not identify a resource directly, nor does it convey authority to access one.
  • A handle is an authoritative reference to a specific resource. Handles are typically produced by resolving a path, and they provide the ability to interact with the referenced resource. Unlike paths, handles do not describe where a resource is located in the namespace; they refer directly to the resource itself. Handles are temporal in nature: they may become invalid over time (for example, when closed).
  • A filesystem is an organisation of resources together with rules for naming, resolving, and accessing those resources. Conceptually, a filesystem defines a namespace in which paths may be interpreted and resolved. Resolution establishes a relationship between paths and resources, but this relationship is not assumed to be stable across time or operations.
  • Resolution is the act of turning a path into a handle within the context of a filesystem. Resolution is an effectual operation: it may fail, and its result may depend on the state of the filesystem at the time it is performed. Making resolution explicit is central to distinguishing between operations on names and operations on resources.
  • A capability is the authority to perform a particular class of operations on a resource or within a filesystem. In this proposal, capabilities are represented explicitly by values (usually handles) and are not assumed implicitly through ambient state or global context. Treating filesystem access as capability-based allows code to be written that is safer, more composable, and more amenable to restriction or sandboxing.

4. Design complications

4.1. Locations and resources are distinct

In conversation, I've received a fair bit of pushback from multiple individuals on the normalisation I've proposed in the prior section. The essential argument is that the nice, algebraic model of paths isn't able to fully abstract over the world of system-specific details. To give a few examples:

  • There's the symlink stuff from earlier
    • stat foo/bar is different to stat foo/bar/.
  • Some tools like rsync treat foo differently to foo/
    • /.. is / (root is a fixed point essentially on Linux)

I really dislike these complications, particularly because in accepting this messiness we abdicate the handling of it to the user, and thus make it easier to write naïevely buggy code.

Looking at this another way, even more pushback along these lines is deserved. There is an abundance of unquestioned issues of this nature with the current status quo. For example: it is currently not possible to write to a file and then move it without there being an opportunity for the file to be replaced entirely in between each step.

This kind of issue and much of the pushback I've received essentially stems from one core issue: we often like to think of paths as unique resource descriptors, when in fact they are unique location descriptors. This is a subtle but important distinction. It is responsible for a large class of bugs and vulnerabilities known as TOCTTOU (Time of Check to Time of Use). Essentially any time a path is reused with any degree of outside influence (over the path or filesystem), it is near trivial to swap out the file in between operations by constructing a deep directory nesting and monitoring the directory atimes (yes, really). The only system where we can truthfully say this is not an issue is one with no concurrency of any sort (including time-sharing).

This is in large part a consequence of the initial POSIX standard being path oriented, a limitation that is gradually being rectified with the addition of f<op> and <op>at calls, such as faccessat. These calls operate not on a path to the file, but a handle to the file itself: a file descriptor, or FD for short. File descriptors essentially sit in between a description of the location of a resource, and the data on disk. I am less familiar with the NT situation, but am lead to believe that it has been ahead of nix in supporting handle-based path operations.

Other programming languages have also recognised this issue, for example Python's Pathlib separates paths into pure and concrete paths, creating a clear split between an abstract conception of a path (as I've found myself attracted to thus far), and something that actually exists on the filesystem. I suspect we can go even further, and consider a scheme by which we provoke the user into obtaining a reference to the resource at a path when they want to work on it, and so avoid TOCTTOU-style issues and related messiness wholesale.

I conjecture that with the development of the file descriptor based API in POSIX 2008, Linux 2.6, and OpenBSD we have the capability to fulfill this ideal by building a path-like type that is oriented around file descriptors rather than path strings. We can make a more concrete path type than Pathlib's "concrete" paths. Arguably this isn't really a path any more so much as a handle to a filesystem-addressed resource. I'm not sure what best to call this, but regardless it seems exceptionally useful for writing safe filesystem-interacting code.

4.2. Filesystems are mutable and concurrent

To write

4.4. Empty segments

If one reads on Linux Pathname Lookup, one may notice that while empty segments are generally invalid, with the appropriate flags it can be valid to interact with the empty path.

I cannot begin to imagine any legitimate use case for this, and so am inclined to pretend this edge case doesn't exist, particularly since there's no easy way to supply the necessary flag to Julia's (public) filesystem functions.

4.5. Virtual filesystems

Most language's bake in the assumption that you are working on the local filesystem. While there have been some forwards-thinking efforts that untethered the notion of paths from the local machine (notably Plan9), even among modern languages, support for virtual filesystems from the foundations of the language up is rare.

Meanwhile, in the Linux, BSD, and MacOS operating systems virtual filesystems are increasingly recognised as a valuable abstraction for container runtimes, sandboxes, mixing remote and local filesystems, and more.

When support for virtual filesystems is not provided by the language itself, but instead by the ecosystem, even if there is a single de-facto package, the design of the basic path type and filesystem API is key to the cohesiveness of the final result. A class/trait based language can allow a custom type to be accepted, but if it's required that all path-like objects be shoehorned into a string-based representation (as Python's os.PathLike and Rust's AsRef<Path> require) sharply limit what is possible. In particular string-based designs make it very difficult to apply a capability model.

4.6. Preserving performance

4.6.1. Avoiding overhead when invoking Libuv path methods

Using a contiguous null terminated char array (whether a String, Memory{UInt8}, or Vector{UInt8}) for the internal representation of system paths makes it possible to pass the path representation directly to Libuv with no overhead.

4.6.2. SubStrings as the type of path components

Since we've got good reason to want a single contiguous on-disk format, operations that fetch components of the path will either have to allocate a new object … or we can use a SubString. This is the approach FilePaths2.jl takes, and I like it.

5. Design goals

5.1. High level path interface

While we want to end up with a Path type, it would be good to take a step back, consider what makes a "path" conceptually, and define an abstract type we can then specialise on.

I propose that in the abstract, a path is an ordered series of directions that takes you to a location.

From this, we can conceptualise a path as a list of direction segments, and arrive at a few fundamental operations:

  • root: the origin point
  • parent: the sequence of directions up to but excluding the most recent one
  • length: the number of directions in the path
  • iterate: give each direction of the path
  • basename: the most recent segment
  • joinpath: combine two sets of directions
  • children: the immediate next paths one may take (maybe?)

A path that includes a root is considered absolute, and other paths are relative.

5.2. Make invalid paths unconstructable

There are some characters that may not appear in a path.

Posix (Linux/BSD/Mac)
the null byte (\0).
Windows
|, the null byte (\0), and ASCII codes \x01 ~ \x31.

There are also some restrictions on filenames:

Posix
  • / and the null byte (\0).
  • Reserved file names: . and ..
Windows
This is a superset of the restrictions for Posix
  • Reserved characters: <, >, :, ", \, /, |, ?, and *. - null byte (\0)
  • ASCII codes \x01 ~ \x31.
  • Reserved names: CON, PRN, AUX, NUL, CON1COM9, LPT1LPT9 regardless of extension
  • Any filename ending with = = or .

We will also pretend that the empty path is never allowed (see: Compromises).

References:

It's fairly easy to apply the Posix path restrictions when constructing paths, but Windows is a bit of a pain, making me think perhaps it's not worth the effort.

Since the Windows restrictions are a (large) superset of the Posix restrictions, one approach that I'd like to explore is validating the Posix requirements are met during path construction, and then maybe checking for forms Windows wouldn't like in literal path construction (with the p"" macro) and emitting a warning.

5.3. Enforced standard form

5.3.1. Avoid representational ambiguities

Thinking of a path as a representation of a location, the existence of the . and .. pseudopath components complicates path considerations:

  • are foo/bar/.., foo/., and foo equal?
  • is foo/bar/ the same as foo/bar?
  • are foo\\bar and foo/bar equivalent on Windows?

These questions tend to fall under path normalisation, but by using a dedicated path type and path-specific operations I contend that we can decide on rules for a canonical representation of a path, and make it the only form that can be constructed. There is no need for path normalisation, as it is no longer possible to construct an abnormal path.

Frames also discusses the algebraic appeal of normalised paths in her blog post. It also feels like the more principled choice to me, and I suspect it makes it harder to fall into a few edge cases.

While I really want to pre-normalise .. components, it seems like I might have to give up on this front …despite the fact that the vast majority of the time this will be a nice simplification without any gotchas 😔

5.4. Safe path interpolation

When interpreting externally provided content as a path, the existence of the pseudopath element introduces a risk of ending up in an unexpected directory. The intent of code like joinpath(workdir, subdirname) can be subverted (deliberately or accidentally) by providing a subdirname like /some/other/dir, ../stuff, the empty string, or even a null byte. This is most of the Path Traversal class of CVEs. When interpreting a string as a path segment, we can validate that it is a "normal" path segment and raise an error otherwise, preventing a surprising result from appearing with forms like p"path/$var/$name.txt".

5.5. Convenient prefixes

5.5.1. The ~ home shorthand

The handling of ~ in paths in Julia tends to be trip up people used to shell expansion, but there's a very good reason why Julia doesn't go ahead an interpret "~/dir" as /home/$USER/dir but requires expanduser: ~ is a valid path segment.

Without knowing the intent with which the ~ was written (or generated and passed around) it is not possible to reasonably decide whether it should be interpreted as a "~" segment or a reference to the home directory.

With a path macro, this changes. We can differentiate between a ~ that has been put literally at the start of a path, and a ~ that's come from elsewhere. This makes the convenient ~-home interpretation viable, without re-introducing the current issues. As a tradeoff expressing an initial ~ segment becomes less convenient, but given the relative frequency of home vs. "~" forms, this seems like a worthwhile tradeoff.

5.5.2. Introducing @ project shorthand

I'm not 100% sold on this idea, but I'm interested and it seems worth exploring.

Within package and project code, it is common to see forms like joinpath(@__DIR__, "..", "..", "assets", "file.txt"). There are two major issues with this:

  1. Poor clarity of intent: this is an attempt to express the a target location within a project, but the path is expressed relative to the current file (wherever it may be within the project) rather than the project itself. The fact the form is twisted as a result (with @__DIR__, "..", "..") only makes this less apparent.
  2. As a result of the poor clarity/expression, this form is slightly fragile: moving the source file around will break the reference, even if the target remains in the same location within the project.

Extending the "special literal prefix" handling to treat @ as a project-prefix as ~ is a user-prefix can improve this situation. The choice of @ seems natural given the existing use of @-prefixed special paths in DEPOT_PATH and --project already.

Besides the two issues above, a @-prefix also provides an opportunity to improve the status quo with regard to relocatability. Enough Julia packages use @__DIR__ in paths to make relocatability a general issue (motivating RelocatableFolders.jl and julia/PR#55146). Implementing @ as a relocatable project-relative path (determined at compile-time) creates a form that is both more convenient and more robust, a "pit of success".

5.6. Uniform cross-platform support

5.6.1. Platform-specific path types

It seems likely useful to still be able to model paths of other platforms, and we can do this without compromising ergonomics fairly easily by defining <Platform>Path types and then having a Path for the current platform.

5.6.2. Cross-platform path construction

Posix exclusively uses the / delimiter, and Windows accepts \ (preferred) or /.

As such, we can reasonably settle on / as the in-Julia syntax for paths, and handle operating system dependent normalisation in the background. This makes it impossible to accidentally hardcode a particular platform's delimiters.

5.7. Handles as authoritative references

To write

5.8. Capability support through handles

To write

5.9. First-class Virtual Filesystems

5.9.1. Explicit filesystem identity

To write

5.9.2. Layered filesystem capabilities

To write

5.10. Making the safe path the happy/easy path

From all the investigation we've done so far, we know that:

  • An abstract path type allows for sensible and efficient path manipulation
  • A concrete fd-based path type allows for some TOCTTOU-safe filesystem operations
  • A specialised directory entry type allows for efficient readdir usage, and for other TOCTTOU-safe filesystem operations

Currently, we just use String for all of these purposes. This is "simple" in the sense that all of the inherent complexity is put off to the user of the API to think about. By contrast, this trio of system path types requires a little more upfront thinking, but this is paid for several times over in the reduction of edge cases that package developers and end users may hit.

6. Proposal

6.1. Type hierarchy

Arriving at, then determining how to satisfy, the key criteria takes careful thought. I shall now describe the third major iteration of the design. I will attempt to lay out a logical progression of ideas, but it is worth noting that the current design has not emerging from a clean linear thought process, but rather from imbibing a wide array of example models (see 7), trying various approaches, to simultaneously sketch the shape of a solution and distil the soup of ideas until a model that fit the shape emerged.

6.1.1. Key criteria

We want to have:

  1. Entirely separate types for paths and handles, since neither fit under the other
  2. Dedicated per-OS path types
  3. The ability to define new path and handle types, which "just work"
  4. A convenient and consistent way to go from paths to handles
  5. A dedicated filesystem type/interface, with paths and handles that include the filesystem
  6. A go-to Path type for use in argument type annotations, that: a) Encompasses both paths and handles b) Supports both local and virtual filesystems c) While allowing for simple usage
  7. Provides the foundations for a capability-based access model
  8. Allows for arbitrary layering / mix 'n match with different kinds of filesystems
  9. Is type stable and efficient

6.1.2. Handles and paths

To start with, let us consider what the most critical detail of the separate path and handle types required by 1 is. A handle is a wrapper around something and a path can be resolved to a handle for that same something. It follows the principal type parameter of an AbstractHandle is what the something is, and that an AbstractPath should specify what kind of handle it may be resolved to.

julia
#
abstract type AbstractHandle end

abstract type AbstractPath{H <: AbstractHandle} end

handle(h::AbstractHandle) = h
handle(::AbstractPath{H}) :: H
Listing 1: The naïeve handle and path abstract types, along with the handle interface.

This simple design has the potential to complicate calling and writing filesystem code. We want to encourage packages to write handle-first code, since it avoids more filesystem corner-cases (along with a few other benefits described later), however often external programs and users will want to provide a path. Asking package authors to write two versions of each function is untenable, even with handle making it easy:

julia
#
function dothing(x::Foo)
    # ...
end

dothing(x::AbstractPath{Foo}) = dothing(handle(x))
Listing 2: A brief sketch of path/handle-consuming function, showing how although handle makes it easy, two methods are required to support paths and handles.

Realistically, package authors will forget to implement at least one "flavour" and so the ecosystem will end up incompatibly split between path-based and handle-based APIs. We need a single unified way of supporting paths and handles. This is what criteria 6 is about.

We can create a convenient Path type alias that paths and handles of the same nature sit under. We could have const PathOrHandle{H} = Union{<:AbstractPath{H}, H} where {H <: AbstractHandle}, but I think it is worth considering whether there could be a sane supertype that AbstractPath and AbstractHandle could both sit under.

Focusing on the key practical consideration of a path —what kind of handle it may produce— leads us to what a simple but meaningful abstraction: a type that can be resolved to a certain kind of handle. Let us call this AbstractResolvable (instead of PathOrHandle). The interface is thus: an instance of AbstractResolvable{T} must be able to produce an AbstractHandle{T} when handle is called on it. To have AbstractHandle be both a subtype as well as part of the AbstractResolvable type parameter like this would be simpler if Julia supported a self-like keyword when defining type relations, or supported F-bounded polymorphism. Alas, Julia does not, and so we must be a little more verbose and repeat the handle type as a type parameter in the AbstractHandle/AbstractPath subtype declaration.

julia
#
abstract type _AbstractResolvable{H} end # Internal implementation detail

abstract type AbstractHandle{H} <: _AbstractResolvable{AbstractHandle{H}} end

const AbstractResolvable{H <: AbstractHandle} = _AbstractResolvable{AbstractHandle{H}}

abstract type AbstractPath{H} <: AbstractResolvable{H} end
Listing 3: An evolution of the naïve type design, placing paths and handles under a cohesive handle-based type hierarchy.
resolvable-path-handle.svg
Figure 1: Over each handle type \(H\), resolvables form a fibre: objects \(T \in\) ~AbstractResolvable{H}~ are equipped with \(\operatorname{handle}_T : T \to H\). The handle object itself plays the terminal role in this fibre, and AbstractPath{H} forms a path subfamily.

All together, this allows us to have a handle type Foo that can be resolved from a Bar path, where:

  • Foo is an instance of AbstractHandle
  • Bar is an instance of AbstractPath
  • Both Foo and Bar are an AbstractResolvable{Foo}
julia
#
struct Foo <: AbstractHandle{Foo} #= ... =# end
struct Bar <: AbstractPath{Foo} #= ... =# end

Foo <: AbstractHandle{Foo} <: AbstractResolvable{Foo}
Bar <: AbstractPath{Foo}   <: AbstractResolvable{Foo}
Listing 4: An example showing how a handle type (Foo) and related path type (Bar) go together under the revised handle/path model.

If we then formalise the filesystem interface (which we will do), and write generic fallback methods for AbstractPath arguments, package authors can write a single method with ::AbstractResolvable{<:AbstractFilesystemHandle} arguments and uniformly handle path and handle flavoured arguments.

To push package authors to use handles themselves, we can consider adding Base.depwarn messages within the path fallback methods.

6.1.3. Incorporating the filesystem

We can easily satisfy 6a and 8 by making the filesystem an explicit part of the paths and handles. To implement methods specific to a particular filesystem, it follows that the kind of filesystem must be a type parameter of the path/handle.

We could directly add a filesystem type to the abstract path and handle types, but I appreciate the simplicity of the current forms, and (in my view) make them more inelegant. There is another route, that I will take here: defining a handle subtype that opts-in to the filesystem interface and holds the filesystem kind as a type parameter.

julia
#
abstract type AbstractFilesystem end

abstract type AbstractFileHandle{F<:AbstractFilesystem, H} <: AbstractHandle{H} end
Listing 5: The definition of a top-level abstract filesystem type. The allows for a derived handle type that opts into the abstract filesystem interface.

It is expected that (in general) the filesystem instance will be carried around its path/handle types as an explicit field. The local filesystem is one exception to this, as it can be simply expressed as a singleton type.

julia
#
struct LocalFilesystem <: AbstractFilesystem end
Listing 6: Since the local filesystem is soley the domain of the operating system, we only need a singleton type to be able to specify that a handle exists on the local filesystem.

This allows us to refer to any type that may be treated as a file with AbstractResolvable{<:AbstractFileHandle}, which we can alias with Path to help direct package authors to use it as the go-to type annotation for a type that can be used as a file.

6.1.4. Per-system path types

To use a path type on multiple systems there are fundamentally three distinct approaches we can take: a. Have a single concrete type, that behaves as appropriate for the current platform. Modelling other platforms is not possible. b. Implement a type for each path, and require users/package authors to explicitly reason about what the current platform is. c. Implement types for each platform, and have a current platform type.

The first is convenient, the second flexible, and the third both. We'll take the third approach (the same as Python's Pathlib, Racket, and FilePathsBase.jl).

With our use of handles, and their inclusion as a path type parameter there's something else to consider: foreign system paths can only be reasoned about abstractly, not converted to a filesystem handle (leaving remote connections aside). As such, an alias for whatever path type is appropriate for the current system doesn't quite fit. Instead, we can define one extra path type that's a thin wrapper around the platform-appropriate type.

julia
#
struct Unhandleable <: AbstractHandle{Unhandleable} end

struct PosixPath <: AbstractPath{Unhandleable}
    # ...
end

struct WindowsPath <: AbstractPath{Unhandleable}
    # ...
end

struct LocalFileHandle <: AbstractHandle{LocalFileHandle}
    # ...
end

struct LocalFilepath <: AbstractPath{LocalFileHandle}
    path::(@static if Sys.iswindows() WindowsPath else PosixPath)
end
Listing 7: Per-platform path types that cannot be resolved to a filesystem handle, along with a local filesystem handle and path type.

6.1.5. Allowing for a capability model with relative types

Once you have a handle, it is easy to see how you can want to resolve paths relative to a handle, for instance moving a file between two directories (as handles). This naturally appears in a few contexts, such as when listing the contents of a directory (which are provided as file names relative to a directory handle).

It is also worth noting that this is a construction in which we are able to perform path-based operations not based on an implicit context (the currently working directory) or an absolute path, but rather a "live" object that confers the ability to operate within it.

It should be no surprise that this model has been used as the basis for replacing the ambient authority in filesystem operations with explicit capabilities. We not only have the *at Posix operations, but also Capsicum system calls like cap_rights_limit which is applied to a file descriptor to restrict what can be done with it.

We can neatly fit this model under our existing paradigm as a new AbstractResolvable subtype.

julia
#
struct RelativePath{H <: AbstractHandle, P <: AbstractPath{H}} <: AbstractResolvable{H}
    parent::H
    path::P
end
Listing 8: A generic handle-relative path type, that will take a handle along with a path type which can be resolved to the same kind of handle, and so in combination is resolvable to the very same kind of handle.

6.1.6. The final hierarchy

type-hierachy.svg
Figure 2: Proposed type hierarchy for the paths, handles, and filesystems

6.2. Code

https://code.tecosaur.net/tec/julia-basic-paths

If you'd like to make a PR etc. this is also now mirrored to GitHub: https://github.com/tecosaur/julia-basic-paths

I'm happy to take feedback in any form you're willing to give it. If easy/possible I like receiving a .patch with inline comments 🙂

6.3. Non-breaking changes

Ideally we'd use a time-travel machine to shoehorn this into Julia 1.0, but the second best time to add a path type to Julia is now.

Avoiding breaking changes means we can't remove paper cuts like eachline(::String), but we can provide a better alternative, gradually adopt it, and push for it to become the status quo in the long term.

6.4. Unresolved questions

  • Is treating the drive + / as the path root on Windows good enough?
  • Should we take this opportunity to copy FilePathsBase.jl / FilePaths2.jl and provide more structured outputs to functions like uperm?
  • Can we get away with eagerly normalising .. and requiring realpath when you need to guard against symlink shenanigans?
  • Do we want to under-the-hood transform absolute Windows paths to verbatim-prefixed paths (\\?\), for long file name support?

6.4.1. Now resolved through community discussion

  • Should joining two absolute paths return the latter absolute paths, or raise a runtime warning/error?
    • An error should be thrown

6.5. Mitigating Pain Points

While I like this set of design goals, they're ultimately a compromise between various concerns, and so produce some potential pain points. This should be mitigated as much as possible.

6.6. Comments

6.6.1. Windows UNC paths

Do we want to under-the-hood transform absolute Windows paths to verbatim-prefixed paths https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#win32-file-namespaces (\\?\), for long file name support?

cyhan
NO, This problem should be solved when installing Julia, by setting a registry entry. https://github.com/JuliaLang/julia/issues/46450
Timothy
Sounds reasonable, I wonder how this might work with Julia static binaries though?
Jānis Erdmanis
Perhaps an alternative is to prepend `\\?\` if the path exceeds 260 characters until Windows slowly makes them default.

6.6.2. Windows path lengths

Windows: This is a superset of the restrictions for Path

Jānis Erdmanis
There is also notorious 260 character limit which could be also validated along those restrictions
Timothy
Ah yes, I like this idea. It does depend on a registry setting too though, depending on how the Julia runtime is installed…
Jānis Erdmanis
On the host system one can set a global variable CHECK_LONG_PATHS in the init block that checks the registry entry to decide upon whether path shall or shall not be enforced. I asked Claude and it gave reg query "HKLM\SYSTEM\CurrentControlSet\Control\FileSystem" /v LongPathsEnabled with which I checked on my VM to work. On the POSIX systems warning could be thrown nevertheless which could be disabled by explicitly setting CHECK_LONG_PATHS to false and the check removed in the next decade.
Timothy
Maybe the way to handle this could be to have an on-startup check for Windows systems which then adds the verbatim prefix if needed? The thing I don't like about warnings/errors here is that when a long path is produced via interpolation it makes runtime errors unpredictable, and TBH I'm not a fan of runtime errors in the first place with stuff this basic/fundamental (outside or errors from the filesystem itself/invalid paths).
Jānis Erdmanis
I think you have a point of runtime errors being unpredictable. When developing AppBundler I did encounter cases where my Julia application would run just fine in one and fail in another directory just because of long path limit. Adding a prefix automatically for long paths seems a much better idea.

6.6.3. Is / a drive root on Windows?

Is treating the drive + / as the path root on Windows good enough?

cyhan
This is called a mixed form in cygpath. I think it's good enough. Just need to provide some more conversion functions like cygpath
Timothy
Ta. Thanks for the second opinion.

6.6.4. Better to allow joining two absolute paths, or error?

Should joining two absolute paths return the latter absolute paths, or raise a runtime warning/error?

cyhan
I prefer throwing errors. We can always loosen restrictions later, but it's hard to tighten them.
Timothy
Initially I had mixed feelings, but I'm increasingly coming around to the idea of making combining multiple absolute paths disallowed without explicit handling.

6.6.5. Special handling of . and ..

Can we get away with eagerly normalising .. and requiring realpath when you need to guard against symlink shenanigans?

Neven Sajko
As I said on Discourse, special handling for .. (or .) seems like a bad idea. Just treat it like any other file name. So no normalization, IMO.
Timothy
That's a nice and simple answer, but I remain tempted by this approach because of how it simplifies/eliminates headaches like what the parent directory of a/b/. or a/b/.. is along with similar path-related questions/operations. I'm not sure how best to establish how viable this approach is (or isn't).

6.6.6. Joinpath as a basic path operation

From this, we can conceptualise a path as a list of direction segments, and arrive at a few fundamental operations

Jānis Erdmanis
Shouldn't joinpath also be part of this list?
Timothy
Path concatenation might be worth adding to the generic API, I've just tried to start with the absolute bare minimum. Miles recently made a strong case for adding children in Discourse.
Jānis Erdmanis
I also like the idea of adding children. Even the old tape filesystems could do such operation efficiently as the metadata that listed all files was written in one place.

7. Prior art

Because it's a lot of work to go through all of these implementations, review them, and then write about them, I've made use of some LLMs to help with the write-up. You may well pick this up in the writing style from here on (as well as a few stylistic inconsistencies that aren't hard to notice, but that I am far from sufficiently motivated to correct).

7.1. Summary

7.1.1. Paths

Across modern languages and libraries, filesystem paths are treated as structured names, not as generic text. The common design goal is to make path manipulation correct-by-construction and cross-platform, while keeping filesystem effects explicit and auditable.

7.1.1.1. Paths as distinct value types

Most ecosystems converge on a dedicated path type (or at least a dedicated vocabulary surface) to prevent accidental mixing of "text" and "namespace locations", and to concentrate path semantics in one place. Even where paths remain strings (notably .NET and Node), there is a strong push toward path-specific APIs (Path.Combine, path.join) and "path-like" protocols to reduce ad-hoc string manipulation.

7.1.1.2. Lexical operations vs filesystem effects

A consistent boundary appears between lexical path operations (join, split, parent, extension, normalization) and effectual filesystem interaction. Some systems enforce this boundary by design (Rust keeps Path inert; filesystem operations live in std::fs), while others expose it through naming/API placement (C++17's lexical functions vs canonical, Python's "pure" vs "concrete" classes). The important lesson is to avoid implying that lexical rewriting is equivalent to semantic resolution, especially in the presence of symlinks.

7.1.1.3. Handles as the unit of authority

Where explicit "handles" exist (file descriptors, streams, FileHandle, ports), they are treated as the authoritative reference to a resource and the place where permissions/capabilities naturally attach. Paths do not become handles through "resolution"; instead, opening a path yields a handle, and path canonicalisation (realpath/canonical/resolve) typically yields only another path value.

7.1.1.4. Platform fidelity and encoding

Path models differ most in how faithfully they represent platform reality. Rust and Racket emphasise round-tripping arbitrary OS paths (including non-UTF8) via OS-string/byte representations, making string conversion fallible. Python and many high-level models assume Unicode strings but still acknowledge platform-specific semantics (POSIX vs Windows path rules). A robust design must be explicit about what is representable, and what conversions are lossy or fail.

7.1.1.5. A small core plus high-leverage conveniences

Successful path APIs provide a compact, predictable core (join/parent/basename/extension/absolute/relative, segment iteration) plus a small set of high-leverage conveniences (with_extension, with_name, suffixes, strip_prefix, ancestors). This reduces bespoke string parsing and yields consistent behaviour across the ecosystem.

7.1.1.6. Ambient context is the default, but explicitly isolating context is valuable

Most mainstream standard libraries interpret paths in an ambient OS namespace (current working directory, mounted filesystems, process view). Where alternative contexts are needed, ecosystems either (a) provide explicit context objects (Java NIO.2 FileSystem, Go's fs.FS), or (b) introduce capability-flavoured directory handles as the root of resolution (WASI, cap-std, Zig's Dir). Even if not adopted everywhere, making "context" representable as a value enables sandboxing, testability, and predictable semantics for non-local backends.

7.1.1.7. Design takeaways
  • Treat paths as structured names with a coherent API; avoid "strings plus conventions" wherever possible.
  • Keep lexical manipulation separate from effectual filesystem interaction; do not conflate canonicalisation with handle acquisition.
  • Make handles the unit of authority and the primary return type of "open"-style operations.
  • Be explicit about platform semantics and encoding/round-tripping guarantees.
  • Provide a small core of primitives and a curated set of composable convenience transforms.
  • Keep ambient OS behaviour as the ergonomic default, while ensuring the design leaves room for explicit context/capability patterns where needed.

7.1.2. Virtual Filesystems

Across systems, a "virtual filesystem" abstraction is primarily about making filesystem context explicit and substitutable, so code can operate over different backends (OS, in-memory, archives, remote stores) and different views (rooted, overlaid, redirected) with minimal change. Designs vary mainly in what is made first-class/ (filesystem object vs path value vs handles), and in how strongly they treat authority/sandboxing as a core concern.

7.1.2.1. Filesystem objects as the unit of substitution

Many successful designs make the filesystem itself the injectable boundary: you pass an FS/FileSystem/afero.Fs/llvm::vfs::FileSystem value into code, and all operations go through that object. This centralises backend choice, credentials/configuration, and view-construction, and avoids hard-wiring the ambient OS namespace into libraries.

7.1.2.2. Minimal interfaces vs OS-shaped completeness

There's a recurring split between:

Minimal, capability-friendly cores
for example, Go io/fs's Open + optional extensions, which maximise implementability and composability, and
Broad, os-shaped interfaces
(Afero) that better support real applications (writes, renames, permissions) but are harder to implement consistently across heterogeneous backends.

This suggests a "small core + layered extensions" approach scales best.

7.1.2.3. Names are often stringly, even when context is explicit

Even sophisticated VFS systems frequently keep pathnames as strings (Go io/fs, Afero, LLVM VFS, Linux syscalls, WASI path parameters). The safety and clarity then comes from (a) a strict path-name contract (Go), (b) scheme-in-string dispatch (fsspec), or (c) forcing resolution to be relative to explicit context/handles (WASI, openat-style APIs). A first-class Path value type is comparatively rare in VFS layers; it usually lives at the language level, not the VFS layer.

7.1.2.4. Layering and "view construction" are high-leverage primitives

Powerful VFS designs tend to support composition :

  • overlays/stacks (LLVM's OverlayFileSystem, Afero's CopyOnWriteFs )
  • rooted/sub views (Go's fs.Sub, os.DirFS, Afero's BasePathFs )
  • redirection/mapping (LLVM redirecting FS; fsspec protocol routing)

These primitives solve many practical problems (testing, reproducible builds, generated files, sandboxed plugins) without bespoke ad-hoc hooks.

7.1.2.5. Two-phase model: resolve names, then operate on handles

Across OS-level and capability-flavoured designs, the most stable conceptual split is:

  • name lookup/resolution, which are effectual/stateful, and may depend on mounts/symlinks/caches, producing
  • handles, in the form of file descriptors, fs.File, vfs::File , file-like objects as the authority-bearing resource reference.
7.1.2.6. Capability models are a distinct axis

Some VFS approaches treat sandboxing as an emergent pattern (pass a rooted FS object; don't call the ambient one), while others make it the default (WASI: only pre-opened directory capabilities; descriptor-relative operations; absolute paths disallowed). The lesson is that capability-oriented designs work best when the API shape forces context/authority to be explicit (directory-handle-centric *at operations), rather than relying on convention.

7.1.2.7. Semantic variance is unavoidable and must be surfaced

When VFS spans non-POSIX backends (object stores, HTTP, archives), "filesystem-like" operations have unavoidable semantic differences (directories, atomicity, consistency, metadata fidelity). Systems either:

  • embrace a "common denominator" contract (Go io/fs read-only, strict names), or
  • provide a wide API but accept backend-specific sharp edges (Afero, fsspec).

A robust design should assume semantics are backend-dependent and avoid implying POSIX invariants unless guaranteed.

7.1.2.8. Design takeaways
  • Make filesystem context substitutable and injectable; avoid baking "the OS filesystem" into library code.
  • Prefer a small core interface with optional extensions; keep richer surfaces layered.
  • Treat resolution as effectual lookup that yields handles/metadata, not "better paths".
  • Provide composition primitives (rooting/sub, overlay, redirection) as first-class building blocks.
  • If capability/sandboxing matters, encode it in the API shape (handle-relative operations), not just documentation.
  • Be explicit about semantic guarantees; don't pretend disparate backends share POSIX behaviour.

7.2. Paths

7.2.1. Python's Pathlib

https://peps.python.org/pep-0428 https://peps.python.org/pep-0519
https://docs.python.org/3/library/pathlib.html

Python's pathlib is generally praised for offering an ergonomic way of handling filesystem paths. It makes paths first-class types, with a deliberate split between pure (lexical) path manipulation and concrete (I/O) paths that interact with the filesystem. Each kind of path is further divided into POSIX and Windows flavours, with a mostly-uniform interface.

python-pathlib.svg
Figure 3: Class and interface relationships in pathlib

Sample usage

Python
#
from pathlib import Path, PureWindowsPath
import os

# Pure, lexical construction (no I/O)
win = PureWindowsPath(r"C:\Users\TEC") / "project" / "data.csv"

# Concrete, effectful operations on the local OS filesystem
p = Path("data") / "results.csv"
text = p.read_text(encoding="utf-8")     # opens/reads the file (I/O)

# Explicit demotion for interop with APIs expecting a filesystem path representation
os_path = os.fspath(p)
7.2.1.1. Pure and Concrete paths

Pathlib separates out purely conceptual and filesystem-grounded paths as pure and concrete paths.

Operating on pure paths does not involve any interaction with the filesystem, while concrete paths check for symlinks, resolve symlinks, and verify various path operations using the filesystem. This also makes the transition from operating on the path to interacting with the filesystem explicit. Note that this is not true path "resolution" in the sense that concrete paths are still string based, instead of obtaining a resource handle.

7.2.1.2. Posix and Windows paths

Pathlib provides per-platform path classes, and aliases Path/PurePath based on the current platform. This preserves OS-specific semantics (drives/UNC, separators, etc.), and allows for working with non-native paths when needed, without forcing all callers to write per-platform code.

7.2.1.3. Not a string subclass

PEP 428 explicitly decides against deriving paths from str to avoid silent misuse.

7.2.1.4. Interoperability via path protocol

PEP 519 creates a path protocol (os.PathLike / __fspath__) so that "path objects" can be accepted across the stdlib. Path objects can be demoted to str/bytes for legacy APIs, while constructors can promote strings into path objects.

7.2.1.5. Solid path API

The Pathlib API provides the basics (parent, parts, joinpath, home), and also a decent collection of utilities on top:

  • suffix
  • suffixes
  • stem
  • with_name
  • with_stem
  • with_suffix
  • with_segments
  • from_uri
  • as_uri

Currently Julia covers the basics, but could probably do with some more convenience functions.

7.2.1.6. Tradeoffs
Convenience and separation
while having path manipulation and filesystem interaction methods all with the same class would be convenient, it is seen as more important to split the two, and provide Path/PurePath aliases to make the split more manageable.
7.2.1.7. Limitations
Single ambient filesystem context
the host OS is always implicit
No capability model
there's no concept of authority, or restricted namespaces; access control is left to each calling context

7.2.2. C++17 <filesystem> library

https://en.cppreference.com/w/cpp/header/filesystem.html
https://en.cppreference.com/w/cpp/filesystem/path.html
https://en.cppreference.com/w/cpp/filesystem/path/lexically_normal
https://en.cppreference.com/w/cpp/filesystem/canonical.html
https://learn.microsoft.com/en-us/cpp/standard-library/path-class?view=msvc-170

C++17's filesystem library, based on the Boost library of the same name, introduces a dedicated std::filesystem::path value type for representing names in a filesystem namespace, plus a family of functions for effectual filesystem operations. The design intentionally keeps path manipulation largely lexical, while making "touch the filesystem" operations explicit via separate APIs.

cpp-filesystem.svg
Figure 4: Datatype relationships and key functions in C++'s std::filesystem

Sample usage

C++
#
#include <filesystem>
#include <fstream>
namespace fs = std::filesystem;

fs::path p = fs::path{"config"} / "app.toml";   // lexical: just builds a name

fs::path normalized = p.lexically_normal();     // lexical: no filesystem access
fs::path resolved   = fs::weakly_canonical(p);  // effectful: resolves existing prefix + symlinks

std::ifstream in(resolved);                     // the handle is the stream/FD, not fs::path

for (const fs::directory_entry& e : fs::directory_iterator(resolved.parent_path())) {
  if (e.is_regular_file() && e.path().extension() == ".toml") {
    // ...
  }
 }
7.2.2.1. Natively stored paths with a generic view

Paths store the "pathname" in the native format, but allow viewing the path in a generic (POSIX) format too. There are explicit functions to go between the two forms.

7.2.2.2. Separate lexical and filesystem queries

Lexical operations (e.g. normalisation, relative path construction) are supported with a separate API from filesystem interaction (canonical / weakly_canonical).

7.2.2.3. Exceptions and error-valued returns

Most effectual operations come in two forms: a exception throwing form, and a overloads with std::error_code& that report failures out-of-band.

7.2.2.4. Tradeoffs
Lexical purity vs. meaningful normalisation
having both lexical and filesystem operations operate on the same type is convenient, but introduces a softer separation of intent, and means that knowledge about whether a path refers to a real filesystem resource or not must be thought about and managed by the programmer.
Interoperative convenience vs. sharp edges
implicit convertibility to/from std::basic_string makes adoption easy and incremental, but also blur intent and creates more opportunities for surprises in cross-platform code.
7.2.2.5. Limitations
No capability model
the approach is only concerned with modelling path names/transformations, and not the use of paths.
Single ambient filesystem context
there's no mechanism (or room to insert one) for different kinds of filesystems. Libraries must create parallel APIs.

7.2.3. Rust std::path (Path and PathBuf)

https://doc.rust-lang.org/std/path/index.html
https://doc.rust-lang.org/std/path/struct.Path.html
https://doc.rust-lang.org/std/path/struct.PathBuf.html
https://doc.rust-lang.org/std/fs/index.html
https://rust-lang.github.io/rfcs/0474-path-reform.html

Rust provides Path (borrowed, unsized) and PathBuf (owned) as first-class path types, analogous to str/String, with representations designed to preserve platform-native path encodings and semantics. Filesystem effects are performed through std::fs, producing resource handles like std::fs::File and metadata, rather than "rematerialising" paths as handles.

rust-paths.svg
Figure 5: Key datatypes and traits involved in Rust's path API

Sample usage

rust
#
use std::fs;
use std::path::{Path, PathBuf};

let base: &Path = Path::new("data");
let mut p: PathBuf = base.join("results.csv");  // lexical name construction
p.set_extension("tsv");                         // mutate owned path buffer

let meta = fs::metadata(&p)?;                   // effectful query via std::fs
for entry in fs::read_dir(base)? {              // std::fs takes P: AsRef<Path>
    let entry = entry?;
    if entry.path().extension() == Some(OsStr::new("tsv")) {
        // ...
    }
}
7.2.3.1. Borrowed/owned split

Rust models paths like strings: Path is a borrowed view used pervasively in APIs, while PathBuf is an owned, mutable buffer for constructing and editing paths efficiently (push, pop, set_extension, etc.). This makes path-heavy code ergonomic without forcing allocations at every boundary.

7.2.3.2. Wrapping around the native OS representation

Path/PathBuf are thin wrappers over OsStr/OsString, so they can represent platform-native paths that are not valid Unicode text. Conversions to &str are therefore fallible/optional, which forces callers to confront encoding rather than silently corrupting or rejecting valid OS paths.

7.2.3.3. Comprehensive API for structure and transformation

The standard API covers the common "shape of a pathname" operations: iteration by components, file_name / file_stem / extension, parent, prefix/suffix tests, strip_prefix, joining, and targeted edits (with_extension, with_file_name, plus mutable PathBuf setters). It is deliberately narrower than pathlib's high-level conveniences (no standard home()/tilde expansion, globbing, recursive walking, etc.), which are see-as-needed in ecosystem crates.

7.2.3.4. Hard separation between paths and filesystem operations

Rust keeps path values largely inert: opening a file, reading metadata, canonicalising, iterating directories, etc. is primarily done via std::fs free functions or types. This makes "touches the filesystem" sites stand out in code, and ensures handles are explicit (std::fs::File being the canonical OS-backed handle wrapper).

7.2.3.5. Interoperability via AsRef<Path> (promotion without a dedicated protocol type)

Most filesystem APIs accept P: AsRef<Path>, allowing callers to pass &Path, PathBuf, &str, String, and other path-like inputs without a bespoke path-protocol mechanism. This makes adoption incremental while still standardising the "real" path vocabulary type at API boundaries.

7.2.3.6. Tradeoffs
Ergonomics vs purity
keeping most effects in std::fs sharpens the "names vs resources" distinction, but also means fewer "one object does everything" conveniences compared with pathlib.
Correctness vs string convenience
non-Unicode-capable paths reduce footguns on real systems, but push more code toward OsStr/OsString-aware manipulation when doing text-like operations.
7.2.3.7. Limitations
Single ambient filesystem context
std assumes the host OS filesystem; multiple filesystem instances and VFS composition are not standardised in the core API.
No capability model in std
capability-oriented directory handles and sandbox-friendly "at-style" patterns are left to crates (notably outside std).

7.2.4. Racket paths

https://docs.racket-lang.org/reference/pathutils.html
https://docs.racket-lang.org/reference/Manipulating_Paths.html
https://docs.racket-lang.org/reference/file-ports.html

Racket treats paths as a distinct datatype, while allowing most filesystem APIs to accept either a path value or a string that is promoted to a path. Path values preserve an underlying byte-oriented representation (so not all paths are losslessly representable as strings) and path-manipulation utilities can operate under Unix/Windows conventions independent of the host platform.

racket-paths.svg
Figure 6: Key types and functions involved in Racket's path support

Sample usage

Scheme
#
;; Path construction is lexical
(define p (build-path (current-directory) "data" "results.csv"))

;; "Pure-ish" normalization is available (no filesystem access)
(define p-lex (simplify-path p #f))

;; Filesystem-aware normalization can consult the filesystem (e.g. symlinks)
(define p-fs  (simplify-path p))

;; Path -> handle (port); the port is the resource capability (backed by an OS fd)
(define in (open-input-file p))
(define txt (port->string in))

;; Interop / presentation: string conversion can be lossy
(define display (path->string p))
7.2.4.1. Dedicated path datatype with string interoperability

Filesystem procedures generally accept either a string or a path value, promoting strings via string->path, while procedures that produce filesystem paths return path values. This keeps "path-as-name" visible in values without forcing an all-at-once ecosystem migration away from strings.

7.2.4.2. Byte-oriented representation and lossy string rendering

Paths round-trip through byte strings more faithfully than through Unicode strings: path->string decodes using the platform's conventions and is explicitly documented as unsuitable for lossless "convert to string, tweak, convert back" workflows. This is a concrete example of separating display text from filesystem name representation.

7.2.4.3. Convention-aware paths (Unix vs Windows) independent of host

Racket tracks a path's convention ("kind") and makes many non-effectual procedures sensitive to that kind. Construction from bytes supports an explicit 'unix/'windows convention, enabling manipulation of (say) Windows paths on Unix when no filesystem access is required.

7.2.4.4. Cleansing, lexical normalization, and filesystem-aware simplification

Racket distinguishes several tiers of "make this path nicer":

Cleansing
many primitives cleanse inputs (e.g. redundant separators) before use; cleanse-path is explicitly non-effectual.
Filesystem-aware adjustment
resolve-path can dereference a single soft link, and simplify-path defaults to use-filesystem? = #t, potentially consulting the filesystem and accounting for soft links when eliminating .. to preserve referential meaning.
Pure mode
simplify-path with #f performs syntactic simplification without filesystem access (and can operate on paths for any platform).
7.2.4.5. Separation of names from resource handles

Racket's I/O APIs return ports (e.g. open-input-file) which are the capability-bearing handles used for reading/writing; paths remain names that are interpreted during the open. Ports are backed by OS-level descriptors underneath.

7.2.4.6. Efficiency-oriented decomposition utilities

The API includes primitives designed to avoid allocation patterns that arise from repeated splitting. For example, explode-path is documented to run in time proportional to the path length, unlike iterative split-path usage that allocates intermediate paths.

7.2.4.7. Tradeoffs
Interoperability convenience vs. strictness
accepting both strings and path values keeps APIs ergonomic, but weakens the type-level separation between "text" and "path name" compared to designs that require explicit promotion everywhere.
Convenient defaults vs. explicit context
some "pure" utilities still consult ambient process state (e.g. current-directory as a default base), which is pragmatic but less explicit than a fully capability-oriented model.
7.2.4.8. Limitations
Ambient filesystem context
paths do not carry an explicit filesystem object/context; operations are fundamentally anchored to the host OS filesystem semantics.
No first-class capability model for namespace restriction
while ports are handles, there is no standard "directory capability" pattern (à la openat-style APIs) that makes authority over a subtree explicit in function signatures.

7.2.5. Common Lisp filepaths library

https://github.com/fosskers/filepaths

filepaths is a "modern and consistent filepath manipulation" library for Common Lisp which consolidates scattered pathname utilities, fills in commonly-missed operations, and renames them to be more predictable. It operates purely on names (CL pathnames / namestrings) and intentionally does not probe the filesystem.

cl-fosskers-paths.svg
Figure 7: Key types and methods in fossker's filepaths library

Sample usage

Lisp
#
;; Construct and transform purely lexically (no I/O)
(let* ((p (p:join #p"/home/you/code" "common-lisp" "hello.lisp"))
       (q (p:with-extension p "json")))
  (values
   p
   q
   (p:parent q)
   (p:components q)
   (p:to-string q)))
7.2.5.1. Standard pathname substrate

Common Lisp already has a first-class pathname object (with components like host/device/directory/name/type/version), and namestring is an implementation-defined textual rendering of a pathname. filepaths builds on this existing substrate rather than introducing a new path type.

7.2.5.2. Lexical-only API by design

filepaths explicitly focuses on structural/lexical operations (joins, component access, extension manipulation, structure tests) and avoids filesystem queries such as existence checks. This gives a sharp "names, not resources" boundary.

7.2.5.3. Predictable "modern" operations and naming

The library centers a set of operations that look very similar to what developers expect from newer path APIs: join, parent, with-name, with-parent, extension/with-extension/add-extension/drop-extension, and component conversion (components, from-list).

7.2.5.4. Accepts either pathname or string

Nearly every function accepts either a pathname or a string, and provides explicit "ensure"/conversion helpers (ensure-path, ensure-string, to-string, from-string) to normalize inputs/outputs. This is a pragmatic interoperability story in a Lisp ecosystem where APIs frequently accept "pathname designators".

7.2.5.5. Errors via conditions

For cases where nil is ambiguous, it signals dedicated conditions (e.g. empty-path, no-filename, root-no-parent), which aligns with CL's condition system for recoverable errors.

7.2.5.6. Tradeoffs
Interoperative convenience vs. type clarity
accepting both strings and path objects makes adoption easy, but allows "stringly-path" to remain pervasive, weakening the signaling value of path-typed APIs.
Leverages CL pathnames vs. inherited complexity
reusing pathname avoids ecosystem fragmentation, but also inherits long-standing complexity/quirks of CL pathname semantics and conversions.
7.2.5.7. Limitations
Not a filesystem abstraction
there is no VFS concept, no filesystem context object, and no capability/authority model—only lexical pathname manipulation.
Portability of textual syntax is constrained by CL
because namestring syntax is implementation-dependent (outside of logical pathnames), any string-based portability story depends on the underlying implementation's parsing/rendering choices.

7.2.6. Frames' algebraic path schema

https://www.oxinabox.net/2016/09/14/an-algebraic-structure-for-path-schema-take2.html

Frames proposes a minimal algebraic specification for "path schemas" (file paths, URLs, XPath, globs, etc.) by treating paths as structured names built from roots and relative components, and then separating that lexical structure from the effectual act of evaluating a path against a backing domain.

7.2.6.1. Minimal core: roots, a free monoid, and a faithful action

The model starts with a set of absolute roots \(A\), relative components \(R\), and a pathjoin operator ⋅ such that relative paths form a free monoid (\(R^*\)) and (\(R^*\)) acts faithfully on absolute roots, generating absolute paths (\(A^*\)). This aims to capture "paths as names" with as few primitives as possible, while still deriving most common operations.

I think that paths actually best fit an ordered monoid, since there's a sensible partial order.

7.2.6.2. Multiple roots as a first-class concern

The framework explicitly allows multiple absolute roots (e.g. POSIX "/" vs Windows drive roots and other namespaces), and notes that "zero absolute roots" is theoretically possible but makes evaluation ill-defined for "absolute" lookup.

7.2.6.3. Derived operations: parentdir, basename, root, parts, within

Given the core structure, the post derives familiar operations---parentdir and basename (tail/head), and then root, depth, parts, and within (a restricted "relative" that only works when one path is nested within the other). The emphasis is that these are lexical and don't require touching the backing system.

7.2.6.4. Evaluation is intentionally effectual and many-to-one

Resolution is modeled as an evaluation function (e: \(A^* \to \mathcal{P}(D)\)), mapping an absolute path name to a set of domain objects (to accommodate aliases, links, globs, XPath, etc.). The post distinguishes "MonoPath" schemas (0/1 object, like typical filesystem paths) from "MultiPath" schemas (0/many, like globs/XPath).

7.2.6.5. .. as a "pseudoparent" element and why it's hard

A key design lesson is that treating .. as an ordinary relative component breaks the free-monoid structure; instead the post introduces a special element (ϕ) defined via evaluation (intuitively, "append ϕ then evaluate equals evaluating the parent"). It then argues that POSIX .. semantics are not purely lexical in the presence of symlinks, motivating designs that avoid collapsing .. without filesystem access (and noting that some systems ban it outright).

7.2.6.6. Normalization and relative_to depend on ϕ

A normalization function norm is defined to remove ϕ where possible without changing evaluation, and relative_to(x,y) is defined using a common-prefix computation plus the necessary number of ϕ "up-steps". The post explicitly notes that proofs of the normalization/equivalence properties are non-trivial and not completed there.

7.2.6.7. Optional extensions: canonical names and directory-vs-file paths

Two extensions are sketched: (1) "canonical name" schemas where evaluation is injective (one object ↔ one name), and (2) splitting file paths vs directory paths to restrict which joins are permitted—at the cost of losing a simple monoid over all relative paths (while preserving a free monoid for directory-relative paths).

7.2.6.8. Tradeoffs
  • The high level of generality (covering URLs/XPath/globs/filesystems uniformly) clarifies what is fundamental versus conventional, but it can be too abstract to settle concrete API questions without additional constraints.
  • Introducing ϕ enables familiar operations like relative_to, but it forces a careful split between lexical structure and effectual semantics—highlighting exactly where "stringy" normalization becomes unsound.
7.2.6.9. Limitations
  • This is a theoretical specification rather than an implementation guide; many operational details (errors, permissions, encodings, etc.) are out of scope.
  • ϕ/.. cannot be made fully POSIX-faithful without consulting the filesystem (symlink interaction), and some schemas (notably multipaths like globs) may not admit any ϕ-like element at all.
  • The "evaluation" function models name → object(s), but does not model handles/capabilities; it stops at identifying resources rather than representing authority to operate on them.

7.2.7. Zig's path APIs

https://ziglang.org/documentation/master/std/#std.fs
https://ziglang.org/documentation/master/std/#std.fs.Dir
https://ziglang.org/documentation/master/std/#std.fs.path

Zig largely treats paths as byte slices ([]const u8) plus a set of lexical utilities (std.fs.path). Effectual filesystem operations are separated into std.fs, and are designed to be used primarily via open directory handles (std.fs.Dir) and relative paths, rather than "stringly" absolute-path APIs.

zig-paths.svg
Figure 8: Key types and methods in Zig's filesystem path API

Sample usage

zig
#
// Lexical composition (allocates for convenience)
const rel = try std.fs.path.join(allocator, &.{ "reports", "2026-01.csv" });
defer allocator.free(rel);

// Directory-handle-centric I/O (File/Dir are the actual resource handles)
var root = try std.fs.cwd().openDir("data", .{ .iterate = true });
defer root.close();

var f = try root.openFile(rel, .{ .mode = .read_only });
defer f.close();
7.2.7.1. Paths are functions over slices, not a first-class "Path" type

Rather than a dedicated path value type, Zig centralises path manipulation as functions that operate on []const u8 and expose platform-specific constants (e.g. directory separator and PATH-list delimiter).

7.2.7.2. Dir-centric I/O (directory handles as capability-like authority)

Filesystem operations are intended to be performed relative to a Dir handle (an OS-backed resource), which can reduce TOCTOU hazards and makes "what subtree do we have authority over?" more explicit in APIs and call graphs. The presence of *Absolute convenience functions is increasingly treated as legacy/avoidable, since many are thin wrappers over cwd().* calls.

7.2.7.3. Explicit Windows-wide variants alongside UTF-8/byte-slice APIs

Many APIs accept []const u8, but Windows-specific variants exist that take wide strings (WTF-16), reflecting the platform's native calling conventions and making encoding/interoperability concerns explicit at the API boundary.

7.2.7.4. Tradeoffs
Simplicity and performance
treating paths as slices avoids object overhead and keeps path manipulation lightweight, but provides less structural guidance than a dedicated Path type.
Handle-first ergonomics
Dir-relative operations sharpen authority and robustness, but can feel heavier for simple scripts and push more users toward "plumbing" directory handles through APIs.
Platform realism
exposing wide-string Windows variants improves correctness/interoperability, but increases surface area and multiplies "which encoding do I use?" decisions.
7.2.7.5. Limitations
Single ambient filesystem context
cwd() is the default anchor for many operations, and the model assumes the host OS filesystem semantics as the baseline context.
No datatype distinction between paths and strings
paths remain names (byte sequences); "handle-ness" only appears once a Dir/File is opened.

7.2.8. Node.js path and fs

https://nodejs.org/api/path.html
https://nodejs.org/api/fs.html
https://nodejs.org/api/url.html

Node splits lexical pathname manipulation (path) from effectual filesystem interaction (fs), but paths themselves are primarily represented as plain strings (with some fs APIs also accepting Buffer or file: URLs). The net effect is a clear separation of "build a name" vs "touch the filesystem", without introducing a dedicated path value type.

nodejs-paths.svg
Figure 9: Key classes and methods in Node.js' filesystem API

Sample usage

Javascript
#
// Lexical path operations (no I/O)
const p = path.join("data", "results.csv");
const parts = path.parse(p);          // { dir, base, name, ext, ... }
const abs = path.resolve(p);          // still just a string

// Effectful operations (I/O) yield handles
const fh = await fs.promises.open(p, "r");
try {
    const text = await fh.readFile({ encoding: "utf8" });
} finally {
    await fh.close();
}

// URL interop when needed
const fileUrl = url.pathToFileURL(abs);
const p2 = url.fileURLToPath(fileUrl);
7.2.8.1. Stringly paths and "PathLike" inputs

path operates over strings and returns strings. In fs, string paths are interpreted as UTF-8 sequences naming absolute/relative filenames, and relative paths are resolved against process.cwd(). Many fs APIs also accept Buffer and (for file:) WHATWG URL objects, which makes interoperability convenient but keeps the "path as text" boundary relatively soft.

7.2.8.2. Platform-specific semantics with explicit posix/win32 variants

The default path behavior follows the host platform, while path.posix and path.win32 provide explicit access to the other platform's parsing/joining rules. This allows cross-platform manipulation without changing the underlying representation (still strings).

7.2.8.3. Handles as file descriptors and FileHandle

Resource access is mediated by OS-backed handles: the promises API exposes a FileHandle object with explicit close(), while other APIs expose numeric file descriptors. This matches the "handle is authority" story operationally, even though it is not reflected in a capability-oriented path resolution model (paths remain globally interpretable strings).

7.2.8.4. Lexical normalization vs filesystem-aware canonicalization

Node's path functions are purely lexical; canonicalization and symlink resolution live in fs (e.g. realpath). This preserves a practical "names vs effects" separation, but with no distinct type-level boundary between lexical and resolved forms.

7.2.8.5. Tradeoffs
Simplicity and interoperability
strings keep the surface area small and integrate naturally with the JS ecosystem, but do not prevent accidental mixing of "text" and "path".
Cross-platform control vs ergonomics
explicit path.posix/path.win32 enables portable tooling, but correctness remains a caller responsibility because everything is still representationally a string.
Flexible inputs vs clarity
accepting Buffer/URL is pragmatic, but further blurs the conceptual model (multiple "path-like" carriers with different invariants).
7.2.8.6. Limitations
Single ambient filesystem context
fs operations implicitly target the process-visible OS filesystem, with relative paths interpreted via process.cwd().
No capability model
while handles exist (FileHandle/fd), the dominant API surface remains ambient (global path resolution rather than resolution relative to an explicit directory capability).
No first-class path value
the model cannot leverage types to encode invariants (absolute vs relative, platform flavour, lexical vs canonical) beyond convention and runtime checks.

7.2.9. .NET's System.IO

https://learn.microsoft.com/en-us/dotnet/api/system.io.path?view=net-10.0
https://learn.microsoft.com/en-us/dotnet/api/system.io.path.combine?view=net-10.0
https://learn.microsoft.com/en-us/dotnet/api/system.io.path.join?view=net-10.0
https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getfullpath?view=net-10.0
https://learn.microsoft.com/en-us/dotnet/api/system.io.file.openhandle?view=net-10.0
https://learn.microsoft.com/en-us/aspnet/core/fundamentals/file-providers?view=aspnetcore-10.0

.NET's core System.IO APIs treat paths primarily as string values, with lexical manipulation provided by the static Path utility class and effectual operations provided by File/Directory and stream/handle types. Stable authority over a resource is represented by streams/OS-handle wrappers created by opening a path, rather than by the path value itself.

dotnet-paths.svg
Figure 10: Key classes and methods in .NET's filesystem API

Sample usage

csharp
#
var p    = Path.Combine("data", "results.csv");  // lexical construction
var full = Path.GetFullPath(p);                  // resolves against current directory

var text = File.ReadAllText(full);               // effectful: opens + reads

using var h = File.OpenHandle(full);             // explicit OS-handle wrapper
using var s = new FileStream(h, FileAccess.Read);
7.2.9.1. Path as a purely-lexical utility surface

System.IO.Path is a "path algebra" of string-in/string-out helpers (join, split, query), while filesystem effects live in other types. This keeps naming operations distinct at the API level, but does not create a first-class path value type.

7.2.9.2. Joining semantics: Combine vs Join and rooted segments

Path.Combine follows OS-like semantics: if any argument after the first is rooted, earlier components are discarded. Path.Join concatenates more mechanically (preserving duplicate separators), with normalization typically deferred to GetFullPath.

7.2.9.3. Ambient resolution via current directory (and drive rules on Windows)

Path.GetFullPath resolves relative inputs using ambient process state (current directory; and, on Windows, drive-relative conventions). A base-path overload exists to avoid depending on ambient state when determinism matters.

7.2.9.4. Handles are explicit (and increasingly first-class)

The canonical "resource handle" is a stream (e.g. FileStream), and modern .NET also exposes File.OpenHandle returning a SafeFileHandle directly, making the "name → handle" boundary explicit when desired.

7.2.9.5. VFS-style abstractions exist, but outside System.IO

While System.IO itself is OS-filesystem-oriented, .NET includes other, scoped abstractions for non-physical or restricted views—most notably ASP.NET Core's IFileProvider (with physical, embedded, and composite providers). This is not the general-purpose System.IO model, and is intentionally narrower in scope.

7.2.9.6. Tradeoffs
Stringly paths
maximizes interoperability and keeps the core surface small, but conflates text with namespace locations and limits type-driven correctness and dispatch.
Convenient OS-like joins
Combine's rooted-path rule is pragmatic, but can produce surprising results if rootedness appears unexpectedly in inputs.
Exceptions as the primary error channel
simplifies common-case use, but makes "probe" patterns more awkward than explicit error-valued returns.
7.2.9.7. Limitations
No first-class path value type in the BCL
path semantics remain "strings with conventions", despite a rich helper API.
No first-class filesystem-context object in System.IO
there is no standard FileSystem value you pass around to select a backend or restrict authority; alternate filesystem views are provided via separate subsystems (e.g. IFileProvider) rather than integrated into the core path+fs model.

7.2.10. FilePathsBase.jl

https://github.com/rofinn/FilePathsBase.jl
https://rofinn.github.io/FilePathsBase.jl/stable

FilePathsBase.jl is a Julia ecosystem attempt at a first-class path vocabulary type (AbstractPath) with platform-aware concrete path types (e.g. POSIX/Windows) and broad integration with Julia's existing filesystem functions via method extensions. It treats paths as structured values (not strings), while still ultimately operating against the ambient host filesystem for the default "system path" types. It gets a lot of things right, but suffers from a few fatal flaws (such as type instability).

filepathsbase.svg
Figure 11: The type hierarchy in FilePathsBase.jl

Sample usage

julia
#
p = p"data" / "results.csv"          # path literal + /-join

write(p, "hello\n")                  # effectful filesystem use via methods/extensions
txt = read(p, String)

ext = extension(p)                   # "csv"
parent = parent(p)

here = @__PATH__                     # path-valued analogue of @__FILE__
7.2.10.1. AbstractPath as a vocabulary type (not AbstractString)

A central design choice is making paths distinct from strings, rather than a string subtype. This forces explicit conversions at boundaries and reduces accidental misuse (e.g. treating text as a filesystem name or vice-versa). The package also preserves * for string concatenation and uses / for path joining.

7.2.10.2. Platform-specific path types with a "system path" default

FilePathsBase provides separate path types for differing platform semantics (POSIX vs Windows) and a "system" default intended to match the running platform, analogous to other ecosystems' platform-dispatched path aliases.

7.2.10.3. Structured representation and a "path interface"

Paths are modelled as structured values (conceptually segments plus platform rules). The docs define an interface for implementing new AbstractPath subtypes, including operations to access components and to support common filesystem behaviours. This interface is intended to allow alternate path kinds beyond the local OS, even if the default types map to the ambient host filesystem.

7.2.10.4. Integration via method extension over Base.Filesystem

Rather than introducing a new filesystem context object, the package primarily integrates by extending existing Base.Filesystem-style operations to accept AbstractPath. This makes the API feel "native" when it's in scope, but relies on broad method coverage and consistent adoption.

7.2.10.5. Tradeoffs
Ergonomics vs. "ambient by default"
the default system path types implicitly target the process-visible host filesystem, which is convenient but keeps the core model tied to ambient context.
Operator overloading vs. local surprises
using / for joins is ergonomic, but it must coexist with division and with user expectations about what operators mean in Julia codebases.
External package vs. ecosystem coherence
because this lives outside Base, interoperability depends on adoption and on the completeness of method extensions across Base/stdlibs and third-party packages.
7.2.10.6. Limitations
Not a Base-level abstraction
it cannot enforce a ubiquitous "paths are paths" vocabulary across the ecosystem, so friction remains at API boundaries (conversion, missing overloads, mixed conventions).
No explicit capability or filesystem context object
authority and context remain largely implicit (host filesystem, process CWD), rather than being represented as explicit handles/capabilities.
Type instability
using Tuple{Vararg{String}} for segments makes access/modification O(1), but makes any operations that require the full path O(depth) which isn't great.

7.2.11. FilePaths2.jl

https://gitlab.com/ExpandingMan/FilePaths2.jl

An experimental rethink of typed paths in Julia from 2023, motivated by making path objects usable for non-local backends (notably S3) without implicitly triggering expensive "update" operations. It treats every path as a tree node (via the AbstractTrees interface) and imposes strict rules on what is inferable from path strings and when remote calls are allowed.

filepaths2.svg
Figure 12: The type hierarchy and interfaces in FilePaths2.jl
7.2.11.1. All paths are tree nodes

All path types are required to implement the AbstractTrees.jl interface, so that generic traversal utilities (e.g. walkpath) can be expressed in terms of standard tree algorithms rather than re-implemented per path type. This frames "a path is a node in a namespace tree" as a semantic guarantee, not just a metaphor.

This is complicated by symlinks, which is not addressed in the current design of FilePaths2.jl

7.2.11.2. Strict semantics about what can be inferred from strings

FilePaths2 leans on the fact that "complete" paths can be described by strings from a root, and uses a PathSpec wrapper to support purely-parsed operations like determining parents and testing ancestry/antecedence relationships. A key consequence is that path strings used to construct path objects must be absolute, or must refer to an existing resource so the absolute path can be inferred.

7.2.11.3. Disallowing relative paths as a semantic constraint

Relative paths are treated as a "pun": they depend on ambient global state (e.g. a current directory) which may be ill-defined or nonsensical for remote backends, and they weaken what can be inferred (e.g. parent relationships). Relative forms may still exist as constructors or views, but the path object model aims to be absolute/anchored.

7.2.11.4. Explicit control of remote calls

To avoid accidental network traffic and to make the "reasoning footprint" of operations clear, only a small set of functions are permitted to perform remote calls, and these must accept an update keyword (even if update=false can still error when the object lacks enough information). AbstractTrees.children, readdir, walkpath, ispath, isfile, and isdir are listed (with an explicit note the set may be incomplete).

7.2.11.5. Treating key-value stores as trees by construction

For S3-like systems (key-value stores that "masquerade" as filesystems), the design asks: "what tree can be constructed from only what the API provides?". The proof-of-concept infers tree structure solely from key strings, which yields meaningful definitions for questions like isdir (e.g. "is this a strict ancestor of some leaf?") while acknowledging constraints (e.g. no truly empty directories).

7.2.11.6. Tradeoffs
Stronger invariants vs. friction
requiring absolute/anchored paths and restricting inference rules improves semantic clarity (especially for remote backends), but rejects common local patterns (relative paths) and may require more explicit anchoring in user code.
Cost transparency vs. ergonomic opacity
concentrating remote calls in a small set of "update-permitted" functions makes costs auditable, but can surprise users when seemingly simple queries error unless update=true is provided.
Tree-first abstraction vs. backend mismatch
forcing a tree model onto S3 enables generic tooling, but necessarily bakes in approximations (e.g. "directories" inferred from key prefixes) that are not native to the store.
7.2.11.7. Limitations
  • Incomplete: the project is a prototype, not a complete alternative to FilePathsBase.jl

7.3. Virtual Filesystems

vfs-legend.svg
Figure 13: Legend for node colouring used in VFS structure diagrams

7.3.1. Java's NIO.2

https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/file/Path.html
https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/file/FileSystem.html
https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/nio/file/spi/FileSystemProvider.html
https://jcp.org/en/jsr/detail?id=203
https://docs.oracle.com/javase/8/docs/technotes/guides/io/fsp/zipfilesystemprovider.html

Java NIO.2 makes filesystem context explicit: a Path is a name that belongs to a particular FileSystem, and filesystem implementations are pluggable via a provider SPI. Effectual operations are largely performed through Files/channels/streams, so "resolution" typically yields a handle (channel/stream), while "canonicalisation" yields another Path value.

java-nio2.svg
Figure 14: Structure of the Java NIO.2 VFS

Sample usage

Java
#
import java.nio.file.*;
import java.nio.channels.SeekableByteChannel;
import java.net.URI;
import java.util.Map;

Path p = Path.of("config", "app.toml");          // name on the default filesystem

// Opening yields a handle (channel/stream), not a "resolved path handle"
try (SeekableByteChannel ch = Files.newByteChannel(p, StandardOpenOption.READ)) {
    // read from ch...
}

// Built-in virtual filesystem: treat a zip/jar as its own FileSystem
URI zipUri = URI.create("jar:file:/tmp/data.zip");
try (FileSystem zipfs = FileSystems.newFileSystem(zipUri, Map.of("create", "true"))) {
    Path inZip = zipfs.getPath("/reports/2025.csv");
    Files.copy(inZip, Path.of("/tmp/out.csv"), StandardCopyOption.REPLACE_EXISTING);
}
7.3.1.1. Path is bound to a filesystem instance

A Path is not just syntax; it is associated with a specific FileSystem (via Path.getFileSystem()), and path operations are defined relative to that filesystem's rules. This enables multiple coexisting filesystem namespaces in one process without needing a "scheme-in-string" convention.

7.3.1.2. Filesystem as a pluggable service-provider interface

NIO.2 standardizes a provider model (FileSystemProvider) so new filesystems can be registered and constructed (commonly via URIs and environment maps). This is the key VFS "insertion point": alternate backends can provide their own Path and FileSystem implementations while reusing the same surface API.

7.3.1.3. Separation of names from effectual operations (mostly)

The dominant pattern is: Path represents names; effectual operations live in Files and in opened handles (streams/channels). Even "real path" operations like toRealPath are effectual but still return a Path (a canonical name), not a resource handle.

7.3.1.4. Built-in "zipfs" as concrete VFS prior art

The ZIP filesystem provider demonstrates the model end-to-end: each zip/JAR is a distinct FileSystem, and you operate inside it with ordinary Path and Files calls (copy, move, attributes), with provider-specific configuration carried via the environment map.

7.3.1.5. Capability-flavoured hooks exist, but are not the default

Java includes SecureDirectoryStream , which supports directory-relative operations designed to reduce TOCTOU races (an "openat-like" capability pattern), but most everyday APIs still center the ambient default filesystem and directory.

7.3.1.6. Limitations
Ambient defaults remain central
Path.of(...) / Paths.get(...) and many idioms assume the default filesystem and process-wide working directory, so the "explicit FS" model is often bypassed in typical code.
Semantics leak across providers
cross-filesystem operations exist, but metadata, atomicity, link handling, and permission models can vary by provider, so portability can require extra care beyond just using Path .
7.3.1.7. Tradeoffs
Power vs. complexity
the SPI enables serious extensibility (archives, custom backends), but introduces lifecycle concerns (FileSystem can be Closeable) and a larger conceptual surface area than "OS filesystem only".
"Path carries context" vs. ergonomic ambient usage
binding Path to FileSystem is a clean model for multiple backends, but the ecosystem still needs strong conventions to avoid silently falling back to the default filesystem when a capability/context-driven style is desired.

7.3.2. Python's fsspec

https://filesystem-spec.readthedocs.io/en/latest

fsspec is a de-facto standard interface for "filesystem-like" backends in the Python data ecosystem (local FS, S3/GCS/Azure, HTTP, archives, in-memory). It is intentionally filesystem-centric : paths are usually plain strings (often URI-like), and filesystem instances provide a uniform API (open, ls, rm, cp, …) which returns Python file-like objects for actual I/O.

fsspec.svg
Figure 15: Structure of the fsspec VFS

Sample usage

Python
#
import fsspec

# Scheme-in-string dispatch: build a filesystem instance from a protocol
fs = fsspec.filesystem("s3", anon=False)     # requires s3fs installed

# Uniform filesystem operations
paths = fs.glob("s3://my-bucket/data/*.parquet")

# Open returns a file-like object (handle-ish in Python terms)
with fs.open(paths[0], "rb") as f:
    chunk = f.read(1024)

# Convenience entrypoint that parses the URL and creates the FS implicitly
with fsspec.open("s3://my-bucket/data/file.csv", mode="rt", encoding="utf-8") as f:
    text = f.readline()
7.3.2.1. Protocol-in-string dispatch and a central registry

The main ergonomic trick is that filesystem choice is often encoded in the string via a protocol (e.g. s3://…, gcs://…), and fsspec uses a registry to map protocol → filesystem implementation, so callers can stay backend-agnostic. This makes it easy for libraries to accept "a path" and let fsspec pick the right backend.

7.3.2.2. Filesystem instances are the context boundary

Rather than having a structured Path object carry context, fsspec typically places context on the filesystem instance: credentials, configuration, caching policy, async settings, etc. This makes the FS object the natural injection point for testability and "restricted views" (by handing out a preconfigured FS instance).

7.3.2.3. OpenFile delays engaging resources

High-level entrypoints like fsspec.open / open_files produce an OpenFile wrapper where the actual low-level file object is created only when entering a context manager. This supports serialization/distribution patterns (common in distributed compute) without holding live connections in the object being passed around.

7.3.2.4. A minimal common denominator with extensible "optional" features

At its core, fsspec defines an abstract filesystem base (AbstractFileSystem) and expects implementations to provide a coherent set of file and directory operations. Beyond the core, it adds higher-level conveniences (globbing, mapping interfaces, caching layers, wrappers such as compression/text-mode handling) that can be applied across many backends.

7.3.2.5. Limitations
Stringly-typed names
paths are usually plain strings/URIs rather than structured path values, so lexical path manipulation and cross-platform semantics largely remain conventions or backend-specific behaviour.
Semantic variance across backends
a uniform API can mask important differences (e.g. "directories" on object stores, eventual consistency, atomicity guarantees), so correctness sometimes requires backend knowledge.
No built-in capability model
while passing filesystem objects can approximate contextual restriction, the design does not make "authority" an explicit, typed part of the interface in the way WASI/cap-std style APIs do.
7.3.2.6. Tradeoffs
Very low adoption barrier vs. weaker invariants
scheme-based dispatch + duck-typed file-like objects makes integration easy for the ecosystem, but gives up the stronger guarantees a first-class Path value type and explicit resolution model can provide.
One surface API vs. leaky abstraction
fsspec succeeds by making diverse storage look "filesystem-ish", but the more "non-filesystem" a backend is, the more likely users encounter surprising edge cases that the API cannot fully abstract away.

7.3.3. Go's io/fs

https://pkg.go.dev/io/fs
https://pkg.go.dev/os#DirFS
https://pkg.go.dev/embed

Go's io/fs defines a minimal, read-only filesystem abstraction (fs.FS) plus a small suite of generic consumers (fs.ReadFile, fs.WalkDir, templating ParseFS, http.FS, etc.). It's designed for dependency injection and "VFS by interface", while keeping effects explicit: Open(name) returns an fs.File (a closeable resource/handle).

go-io-fs.svg
Figure 16: Structure of the io/fs VFS

Sample usage

go
#
package main

import (
        "fmt"
        "io/fs"
        "log"
        "os"
        "path/filepath"
)

// Root a view of the OS filesystem at a directory.
func main() {
        fsys := os.DirFS(filepath.Join(".", "assets"))

        // Generic, FS-agnostic consumption.
        b, err := fs.ReadFile(fsys, "config/app.toml")
        if err != nil {
                log.Fatal(err)
        }
        fmt.Println(string(b))

        // Walk the tree (portable slash-separated names).
        err = fs.WalkDir(fsys, ".", func(p string, d fs.DirEntry, err error) error {
                if err != nil {
                        return err
                }
                if !d.IsDir() {
                        fmt.Println(p)
                }
                return nil
        })
        if err != nil {
                log.Fatal(err)
        }
}
7.3.3.1. Minimal core interface, optional "capability" interfaces

fs.FS is intentionally just Open(name) (fs.File, error) , with optional extension interfaces (ReadFileFS, ReadDirFS, StatFS , etc.) that consumers can detect for more efficient implementations. The result is a low barrier to implementing a filesystem backend, while still allowing richer behaviour when available.

7.3.3.2. A portable path-name contract

Unlike OS-native paths, io/fs path names are defined to be unrooted, UTF-8, slash-separated, and to forbid .., ., empty elements, and leading/trailing slashes (except "." as the root). This standardises traversal semantics across platforms and backends, and makes consumers simpler and more predictable.

7.3.3.3. Context objects as an injection point

os.DirFS(dir) produces an fs.FS rooted at a directory, letting libraries accept "a filesystem" rather than reaching for ambient OS state. This is a practical, lightweight step toward capability-style design (authority comes from which FS value you were given), even though it's not a full security boundary by default.

7.3.3.4. Sub-filesystems and compositional wrappers

fs.Sub(fsys, dir) derives a subtree view. If the backend supports SubFS, it can implement this efficiently; otherwise fs.Sub provides a wrapper that rewrites names by prefixing dir . This encourages layering (e.g., "rooted view", "overlay view") without requiring a large inheritance hierarchy.

7.3.3.5. "Producers" and "consumers" across the standard library

Go 1.16 explicitly positioned io/fs as a shared boundary: producers include embed.FS, zip.Reader, and os.DirFS; consumers include http.FS and ParseFS for templates. That standard-library buy-in is a large part of why the abstraction is widely usable.

7.3.3.6. Limitations
Read-only core
the central abstraction does not model mutation (create/write/rename), so write-capable VFS designs need additional interfaces or a parallel API surface.
Stringly path names
the interface standardises name syntax, but doesn't introduce a first-class path value type; correctness relies on conventions plus fs.ValidPath .
Not automatically a sandbox
os.DirFS explicitly warns it is not a chroot-style security mechanism (symlinks can escape), and even the meaning of a relative DirFS("prefix") can be affected by later Chdir .
7.3.3.7. Tradeoffs
Portability vs expressiveness
forbidding rooted paths and .. makes traversal safer and backend-neutral, but diverges from "native path" expectations and pushes some concerns to adapters at the boundary.
Minimalism vs completeness
the tiny Open-only core made adoption easy (lots of implementers), but leaves substantial surface area (writing, atomic renames, permissions, links) outside the unifying abstraction.

7.3.4. Go's afero

https://github.com/spf13/afero
https://pkg.go.dev/github.com/spf13/afero

Afero is a filesystem abstraction library for Go that provides an afero.Fs interface as a drop-in, os-shaped API, plus a collection of concrete backends (OS, in-memory, archives, remote) and composable wrappers (copy-on-write overlays, caching layers, base-path "jails", etc.).

go-afero.svg
Figure 17: Structure of the afero VFS

Sample usage

go
#
package main

import (
        "log"
        "github.com/spf13/afero"
)

func LoadConfig(fs afero.Fs, path string) ([]byte, error) {
        return afero.ReadFile(fs, path)
}

func main() {
        // Production: OS-backed FS.
        osfs := afero.NewOsFs()

        // "Jail" the app to a subtree (library-level chroot).
        appfs := afero.NewBasePathFs(osfs, "/var/myapp")

        // Overlay writes into memory (reads fall through to base).
        sandbox := afero.NewCopyOnWriteFs(afero.NewReadOnlyFs(appfs), afero.NewMemMapFs())

        if err := afero.WriteFile(sandbox, "config.json", []byte(`{"feature":true}`), 0o644); err != nil {
                log.Fatal(err)
        }
        b, _ := LoadConfig(sandbox, "config.json")
        _ = b
}
7.3.4.1. Filesystem-as-parameter, not global

The central move is to pass an afero.Fs into application/library code instead of calling os.* directly, so the caller controls the backing store (OS in production; MemMapFs in tests; wrappers in sandboxes).

7.3.4.2. OS-shaped, read/write interface

Afero intentionally mirrors the standard os surface, aiming for incremental adoption and broad coverage of common filesystem mutations (create/write/rename/remove), not just read-only access.

7.3.4.3. Composition as a first-class design tool

Afero leans hard into wrapping and layering filesystems to construct behaviour: CopyOnWriteFs for overlays, CacheOnReadFs for read caching, BasePathFs for subtree restriction, plus ReadOnlyFs and other helpers.

7.3.4.4. Interoperability with io/fs

Afero positions itself as complementary to Go's io/fs: you can wrap Afero backends to satisfy fs.FS (e.g. via afero.NewIOFS), and generally use io/fs where you only need read-only standard-library semantics.

7.3.4.5. Limitations
Stringly "paths"
call sites still pass string names; Afero doesn't introduce a structured path value type, so lexical/path semantics remain the caller's responsibility (and may differ across backends).
Library-level confinement
wrappers like BasePathFs provide a "jail" within the abstraction, but don't prevent code from bypassing the model by calling os.* directly (so this is not an enforcement mechanism by itself).
Backend fidelity varies
many "non-POSIX" backends exist (cloud/streaming/etc.) and may not support the full set of POSIX-ish expectations (permissions, seeking, listings, etc.) uniformly.
7.3.4.6. Tradeoffs
Broad interface vs implementability
compared to io/fs 's deliberately minimal surface, Afero's richer, mutable API is more demanding for backend implementers — but is the reason it supports write-heavy apps, richer tests, and layered behaviours.
Powerful composition vs semantic sharp edges
overlay/caching layers are extremely useful, but can blur or distort semantics (e.g., error modes, metadata, atomicity) relative to a single "real" filesystem, especially across heterogeneous backends.

7.3.5. LLVM VFS

https://llvm.org/doxygen/classllvm_1_1vfs_1_1FileSystem.html
https://llvm.org/doxygen/VirtualFileSystem_8h.html
https://llvm.org/doxygen/classllvm_1_1vfs_1_1OverlayFileSystem.html
https://llvm.org/doxygen/classllvm_1_1vfs_1_1RedirectingFileSystem.html
https://llvm.org/doxygen/classllvm_1_1vfs_1_1InMemoryFileSystem.html
https://llvm.org/doxygen/classllvm_1_1vfs_1_1File.html

LLVM's VFS is a filesystem-context abstraction used throughout the LLVM/Clang tooling stack to decouple clients from the host OS filesystem. Instead of introducing a dedicated path type, it keeps "paths" as strings and makes the filesystem itself an injectable object whose operations resolve names into metadata and file handles—enabling in-memory files, overlays, and redirecting "views" of a tree for toolchain use-cases.

llvm-vfs.svg
Figure 18: Structure of the LLVM VFS
7.3.5.1. Filesystem object as the primary abstraction

llvm::vfs::FileSystem is an abstract interface representing a namespace plus resolution rules (status, directory iteration, opening files). This makes the filesystem the unit of substitution (real FS vs overlays vs in-memory), rather than trying to encode "which filesystem?" into the path value itself.

7.3.5.2. Clear separation of names, metadata, and handles

The VFS API treats "paths" as names (string parameters), with explicit, effectual operations producing either metadata (Status) or resource handles ( vfs::File). The vfs::File object is the authoritative handle for subsequent reads/buffering/close, not the input path.

7.3.5.3. Composable layering via OverlayFileSystem

OverlayFileSystem composes multiple FileSystem instances into a stacked view, allowing higher layers to shadow files/directories from lower layers (a common compiler/tooling need for "overlaying" headers, SDKs, generated files, etc.).

7.3.5.4. Declarative remapping via RedirectingFileSystem

RedirectingFileSystem supports building a VFS from a declarative description (LLVM's VFS overlay YAML format), enabling external tools to describe a logical tree that maps onto one or more physical locations—useful for build systems and reproducible compilation environments.

7.3.5.5. Testability and determinism via InMemoryFileSystem

InMemoryFileSystem offers a fully synthetic implementation for unit tests and tooling pipelines; it can host files from buffers and participate in overlays, letting clients run "as if" a filesystem existed without touching disk.

7.3.5.6. Error-first, non-exception API surface

Operations typically return ErrorOr<T> / std::error_code ­style results rather than exceptions, matching LLVM's broader style and making failure paths explicit at call sites—useful for tooling where "filesystem may be virtual / partial / inconsistent" is expected.

7.3.5.7. Tradeoffs
Excellent for toolchains, less "language-level"
treating the filesystem as the substitution point (rather than paths) is extremely effective for compilers and build tooling, but it leaves path correctness/safety largely as a convention rather than something enforced by a first-class Path type
7.3.5.8. Limitations
Stringly-typed paths
there is no dedicated path value with enforced invariants; callers can still accidentally mix "text" and "path name", and path parsing/normalisation lives outside the VFS abstraction.
Not a capability model
while passing a FileSystem object is context injection , it does not (by itself) guarantee authority restriction the way directory-handle capability systems do; whether an FS is "sandboxed" depends on the specific implementation provided.

7.3.6. Linux VFS

https://docs.kernel.org/filesystems/vfs.html
https://www.kernel.org/doc/html/latest/filesystems/path-lookup.html
https://www.infradead.org/~mchehab/kernel_docs/filesystems/path-walking.html

Linux's Virtual Filesystem (VFS) is the kernel's common interface layer that lets many different filesystem implementations share one API and one global namespace. Conceptually, it cleanly separates pathname lookup (turning a name into a located object in a mounted namespace) from operations on an opened object (reading/writing via a handle).

linux-vfs.svg
Figure 19: Structure of (part of) the Linux VFS
7.3.6.1. Pathname lookup yields kernel objects, not "a better path string"

Inside the kernel, walking a pathname produces a resolved location in the mounted namespace, represented as a (dentry, vfsmount) pair ("a path"). The lookup process may need to instantiate dentries along the walk and load the corresponding inodes as needed.

7.3.6.2. Dentry/inode caches make "resolution" explicitly stateful

Dentries are RAM-only objects used to cache namespace structure; the dentry cache is intentionally an incomplete, evictable view of the full namespace. Consequently, name resolution is inherently effectual and state-dependent even before you "open" anything.

7.3.6.3. Handle-centric operations after open

Once a file is opened, subsequent operations are performed through the handle (file descriptor → kernel struct file) and dispatch through per-object operation tables. This is the VFS's core split: name lookup is one phase; I/O and metadata operations are another.

7.3.6.4. Mounting defines the namespace context

The meaning of a pathname is defined relative to the process's view of the mount tree: lookup proceeds through mountpoints as it walks components, so "the same string" can denote different resources in different mount namespaces.

7.3.6.5. Tradeoffs
Performance vs complexity
aggressive caching (dcache/icache) makes lookups fast, but increases conceptual complexity and makes "resolution" visibly dependent on kernel state.
Generic interface vs uniform semantics
VFS unifies how operations are invoked, but filesystems can still differ in edge semantics (atomicity, metadata fidelity, special files), so higher layers must be cautious about assuming POSIX-uniform behaviour.
7.3.6.6. Limitations
Not a user-facing VFS API
this is primarily an internal kernel architecture; user code sees syscalls, not the VFS object model directly.
Stringly pathnames at the boundary
pathnames are still passed as strings to syscalls; the kernel's strong separation doesn't automatically give the language a first-class path value type.

7.3.7. Plan 9 and the 9P protocol

https://9p.io/sys/doc/9.html
https://9fans.github.io/plan9port/man/man9/intro.html
https://9p.io/plan9/about.html
https://9p.io/magic/man2html/1/bind
https://css.csail.mit.edu/6.824/2014/papers/plan9.pdf

Plan 9 pushes "filesystem as universal interface" to the extreme: most resources are presented as files, and each process has its own mutable namespace assembled by mounting and binding servers. 9P is the uniform protocol used to traverse names (paths) and obtain authoritative references (fids) to resources hosted by local or remote filesystem servers.

plan9p.svg
Figure 20: Structure of the 9P protocol
7.3.7.1. Per-process namespaces as an explicit "filesystem context"

A Plan 9 namespace is not a global OS singleton: processes inherit a namespace, can modify it (mount/bind), and those changes affect only that process (and its children, depending on how they're spawned). This makes the "context in which paths are interpreted" concrete and composable.

7.3.7.2. Resolution is a protocol-level operation producing a handle

In 9P, you don't "resolve a path into a better path string"; you walk from an existing handle (fid) through a list of path segments, producing a new fid bound to the target object (subject to permissions). That fid is the authoritative reference you later open, read, write, and clunk .

7.3.7.3. "Everything is a file" becomes "many services are fileservers"

Because access is standardized through 9P operations, diverse resources can be surfaced as file trees served by user-space components (editors, networking stacks, IPC services, synthetic /proc-like views, etc.), and then mounted into a process's namespace as needed.

7.3.7.4. Union directories and bind semantics make overlaying a first-class primitive

bind can replace, prepend, or append trees, creating union directories where lookups search a list of members and creation semantics are controlled by flags (notably -c). This provides a principled "overlay FS" story at the namespace layer, rather than as a special filesystem feature.

7.3.7.5. Capability flavour via "start handle + relative walk"

A fid acts like a capability: once you have it, operations are scoped to that object; and crucially, walking is relative to a fid you already possess (e.g., the fid returned by attach). This lines up closely with modern handle-relative APIs (openat , WASI) where authority is mediated by directory handles rather than ambient absolute paths.

7.3.7.6. Tradeoffs
Powerful composition vs mental overhead
per-process, user-assembled namespaces enable elegant redirection/overlay/sandbox patterns, but they shift complexity from "global system configuration" into application/runtime composition.
Uniform protocol vs performance/semantics variance
a single file protocol makes extension easy, but remote/fileserver-backed resources can have very different latency and behaviour from local disk, and clients must expect resolution and metadata to be more "distributed-system-like."
7.3.7.7. Limitations
Not a language-level path model
Plan 9's core innovation is at the OS/protocol layer; it doesn't, by itself, provide a first-class path value type with lexical operations the way modern language stdlibs do.
Portability/interoperability friction
outside Plan 9 (or plan9port/9P clients), the model doesn't map 1:1 onto mainstream OS APIs without adapters, and semantics can surprise POSIX-native tooling.

7.3.8. WASI

https://wa.dev/wasi:filesystem
https://github.com/WebAssembly/wasi-filesystem
https://blog.sunfishcode.online/capabilities-and-filesystems

WASI's filesystem design is explicitly capability-oriented: a module is given pre-opened directory descriptors (capabilities), and all path-based operations are performed relative to one of those descriptors, rather than via an ambient global filesystem namespace. This makes sandboxing the default shape of the API, and pushes "what can this code access?" into explicit values rather than implicit process state.

wasi.svg
Figure 21: Structure of the WASI VFS
7.3.8.1. Pre-opened directories as the root of authority

Rather than allowing a module to name arbitrary absolute paths, WASI starts execution with a set of directory handles supplied by the host/runtime ("pre-opens"). These handles define the only namespaces the module can traverse via path lookup, and they serve as the natural unit for granting or withholding filesystem authority.

7.3.8.2. Descriptor-oriented *at API surface

Filesystem operations are largely phrased as methods on a descriptor (file or directory), with " -at" operations accepting a relative path—mirroring POSIX openat / renameat / unlinkat patterns. This keeps "resolution" (name lookup relative to a base) explicit in the call shape and makes it composable for sandboxing.

7.3.8.3. Path resolution rules enforce sandbox boundaries

Path-taking operations are designed to prevent escaping the granted namespace: absolute/rooted paths are not permitted in key operations (and some operations explicitly reject symlink contents that are absolute/rooted). This shifts a large class of "path traversal" risks from application-level string handling into the runtime's enforced resolution logic.

7.3.8.4. Rights/permissions model evolution

Earlier WASI iterations used a fine-grained "rights" bitmask model; the (archived) wasi-filesystem interface notes moving away from explicit rights and toward a simpler "mode/flags" approach, with access control largely enforced by the host and by the descriptor's mutability constraints. This is a notable trade: simpler, more portable APIs, but less standardized, inspectable authority at the interface level.

7.3.8.5. Tradeoffs
Security-first ergonomics
requiring a base directory descriptor (and rejecting absolute paths) improves auditability and sandboxing, but increases friction when porting "ambient POSIX" code that assumes a process-wide current directory and global namespace.
Authority is clear, but coarse at the interface boundary
shifting detailed permissioning into the host/runtime simplifies the spec and implementations, but reduces the degree to which a module can introspect or statically reason about precise granted capabilities beyond "which pre-opens do I have, and are they mutable?".
7.3.8.6. Limitations
Host-dependent semantics
the API is primarily intended for "host filesystem" access and does not try to fully normalize away OS differences; some behaviors are explicitly host/filesystem dependent.
String paths rather than structured path values
path parameters are WIT string s (not a first-class structured path type), so the interface doesn't itself provide a path-manipulation model—only resolution and operations.

7.3.9. Capsicum and CloudABI

capsicum-vfs.svg
Figure 22: Structure of the Capsicum VFS
cloud-abi.svg
Figure 23: Structure of the CloudABI VFS

7.3.10. Rust's cap-std

rust-cap-std.svg
Figure 24: Structure of Rust's cap-std VFS

7.3.11. POSIX *at functions

posix-at.svg
Figure 25: Interaction of key POSIX *at methods with the filesystem

Date: 2026-01-28 (Rev 3.1)

Author: TEC

Created: 2026-01-28 Wed 01:41

Validate