Doc. no.: P0317R1
Date: 2016-10-15
Reply to: Beman Dawes <bdawes at acm dot org>
Audience: Library

Directory Entry Caching for Filesystem (R1)

Fixing issue 2663, Enable efficient retrieval of file size from directory_entry
and issue 2677, directory_entry::status is not allowed to be cached ...

This paper provides a solution to the problem of efficiently caching state information obtained during directory iteration.

The proposal has been implemented. Guidance is provided for users on how to use or not use cached information as desired, without exposing to the user whether information is actually cached. The proposal allows the user to write code that is fully portable between implementations that cache or do not cache.

Background

Directory iteration in real-world operating systems always returns directory state information containing at least the file name. POSIX has an option to also return file status, but not all popular distributions implement this. Windows always returns file status, file size, and last modification date. Accessing this additional state from the directory entry is much more efficient than re-accessing the file system to obtain it. Users know this and expect the standard library filesystem to deliver the same efficiency.

The initial filesystem TS proposal and the Boost Filesystem implementation limited the additional state stored by class directory_entry to the regular status and symlink status, since these were the only additional elements common to several operating systems. The caching of this additional information was described using mutable exposition-only data members. The LWG removed the mutable members and associated caching wording because mutable members are problematic and race prone in multi-threaded environments. The original design also exposed too many implementation details and was not easily extendible to additional cache information such as file size and last write timestamp.

When directory_entry caching was removed from the TS, the LWG promised to revisit the issue when Filesystem was added to the standard. Two issues where subsequently filed that remind us of that promise.

LWG issue 2663, Enable efficient retrieval of file size from directory_entry, requests that for Windows caching be extended to file size.

LWG issue 2677, directory_entry::status is not allowed to be cached as a quality-of-implementation issue, requests the reinstatement of permission for implementations to cache directory entry information.

The Boost Filesystem library has implemented directory entry caching for many years. That implementation has been updated to conform to this proposal, and is passing its test suite. It will ship to users later this year.

Design decisions

Add cache refresh() functions

The original TS design conflated the observer functions that access cached state information with refreshing the cached state. Since refreshing the cached state is non-const, the cached member data had to be mutable and that was unacceptable. This proposal provides separate non-const refresh functions that are called by all other non-const functions that modify the stored path, ensuring cache integrity. The refresh functions can be called by users if desired to refresh stale cached data. With the separation of refresh and observer functionality, the observer functions become truly const.

Provide observer functions for future needs

During LEWG discussions in Oulu, Geoffrey Romer suggested providing observer functions for attributes beyond those currently supported by real-world file systems. This has the effect of future-proofing directory_entry as file systems evolve. It also sparked the realization that the is_* family of filesystem query functions could be supported efficiently in a user-convenient way.

Revision History

R1 - pre-Issaquah:

Changes requested in Oulu

Changes requested in Chicago

R0 - pre-Oulu: Initial proposal providing two possible approaches to directory entry caching.

Acknowledgements

Thanks to the LEWG and LWG for their many corrections and helpful suggestions. Thanks to Daniel Krügler for his pre-Chicago wording review. Special thanks to Geoffrey Romer for suggesting observer functions to meet future needs, resolving a major concern.

Proposed wording

Changes are relative to N4606.

27.10.12 Class directory_entry [class.directory_entry]

namespace std::filesystem {
  class directory_entry {
  public:
    // constructors and destructor
    directory_entry() noexcept = default;
    directory_entry(const directory_entry&) = default;
    directory_entry(directory_entry&&) noexcept = default;
    explicit directory_entry(const path& p);
    
   ~directory_entry();

    // assignments
    directory_entry& operator=(const directory_entry&) = default;
    directory_entry& operator=(directory_entry&&) noexcept = default;

    // modifiers
    void assign(const path& p);
    
    void replace_filename(const path& p);
    
    
    

    // observers
    const path&  path() const noexcept;
    operator const path&() const noexcept;
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
     
     
    file_status status() const;
    file_status status(error_code& ec) const noexcept;
    file_status symlink_status() const;
    file_status symlink_status(error_code& ec) const noexcept;

    bool operator< (const directory_entry& rhs) const noexcept;
    bool operator==(const directory_entry& rhs) const noexcept;
    bool operator!=(const directory_entry& rhs) const noexcept;
    bool operator<=(const directory_entry& rhs) const noexcept;
    bool operator> (const directory_entry& rhs) const noexcept;
    bool operator>=(const directory_entry& rhs) const noexcept;
  private:
    path   pathobject; // exposition only
    
  };

}

A directory_entry object stores a path object


27.10.12.1 directory_entry constructors [directory_entry.cons]

explicit directory_entry(const path& p);

Effects: Constructs an object of type directory_entry.

Postcondition: path() == p.

27.10.12.2 directory_entry modifiers [directory_entry.mods]

void assign(const path& p);

.

Postcondition: path() == p.

void replace_filename(const path& p);

Postcondition: path() == x.parent_path() / p where x is the value of path() before the function is called.


27.10.12.3 directory_entry observers [directory_entry.obs]

const path& path() const noexcept;
operator const path&() const noexcept;

Returns: pathobject












 

file_status status() const;
file_status status(error_code& ec) const noexcept;

Returns: status(path()) or status(path(), ec), respectively.

Throws: As specified in Error reporting ([fs.err.report]).

file_status symlink_status() const;
file_status symlink_status(error_code& ec) const noexcept;

Returns: symlink_status(path()) or symlink_status(path(), ec), respectively.

Throws: As specified in Error reporting ([fs.err.report]).

bool operator==(const directory_entry& rhs) const noexcept;

Returns: pathobject == rhs.pathobject.

bool operator!=(const directory_entry& rhs) const noexcept;

Returns: pathobject != rhs.pathobject.

bool operator< (const directory_entry& rhs) const noexcept;

Returns: pathobject < rhs.pathobject.

bool operator<=(const directory_entry& rhs) const noexcept;

Returns: pathobject <= rhs.pathobject.

bool operator> (const directory_entry& rhs) const noexcept;

Returns: pathobject > rhs.pathobject.

bool operator>=(const directory_entry& rhs) const noexcept;

Returns: pathobject >= rhs.pathobject.

 7.10.13 Class directory_iterator [class.directory_iterator]

An object of type directory_iterator provides an iterator for a sequence of directory_entry elements representing the files in a directory. [ Note: For iteration into sub-directories, see class recursive_directory_iterator (27.10.14). —end note ]

Insert a new paragraph before the current note that begins at paragraph 9:


Note for implementors

For Windows, directory_entry information is obtained during directory iteration by calling function FindFirstFile, FindFirstFileEx, or FindNextFile, and can be refreshed by calling function GetFileInformationByHandle or GetFileAttributesEx.

For some POSIX-like systems, such as Linux and some BSD-based distributions, glibc versions since 2.19 may support an additional struct dirent field named d_type "making it possible to avoid the expense of calling lstat". POSIX specifies that the macro _DIRENT_HAVE_D_TYPE is defined if d_type is present.

References

Issue 2663, Enable efficient retrieval of file size from directory_entry,
cplusplus.github.io/LWG/lwg-active.html#2663

Issue 2677, directory_entry::status is not allowed to be cached as a quality-of-implementation issue,
cplusplus.github.io/LWG/lwg-active.html#2677

N4582, Working Draft, Standard for Programming Language C++, 2016,
www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4582.pdf

N4100, Programming Languages — C++ — File System Technical Specification, 2014,
www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4100.pdf

Boost Filesystem Library, V3, 2015,
www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/index.htm