vmtouch - the Virtual Memory Toucher
Portable file system cache diagnostics and control
vmtouch is a tool for learning about and controlling the file system cache of unix and unix-like systems. It is BSD licensed so you can basically do whatever you want with it.
Quick install guide:
$ git clone https://github.com/hoytech/vmtouch.git
$ cd vmtouch
$ make
$ sudo make install
What is it good for?
- Discovering which files your OS is caching
- Telling the OS to cache or evict certain files or regions of files
- Locking files into memory so the OS won't evict them
- Preserving virtual memory profile when failing over servers
- Keeping a "hot-standby" file-server
- Plotting filesystem cache usage over time
- Maintaining "soft quotas" of cache usage
- Speeding up batch/cron jobs
- And much more...
Support
To complement the open source community, Hoytech offers services related to vmtouch:
- Advanced feature development
- Support contracts
- Training sessions
Please contact Doug Hoyte for more information.
Examples
Example 1
How much of the /bin/ directory is currently in cache?
$ vmtouch /bin/
Files: 92
Directories: 1
Resident Pages: 348/1307 1M/5M 26.6%
Elapsed: 0.003426 seconds
Example 2
How much of big-dataset.txt is currently in memory?
$ vmtouch -v big-dataset.txt
big-dataset.txt
[ ] 0/42116
Files: 1
Directories: 0
Resident Pages: 0/42116 0/164M 0%
Elapsed: 0.005182 seconds
None of it. Now let's bring part of it into memory with tail:
$ tail -n 10000 big-dataset.txt > /dev/null
Now how much?
$ vmtouch -v big-dataset.txt
big-dataset.txt
[ oOOOOOOO] 4950/42116
Files: 1
Directories: 0
Resident Pages: 4950/42116 19M/164M 11.8%
Elapsed: 0.006706 seconds
vmtouch tells us that 4950 pages at the end of the file are now resident in memory.
Example 3
Let's touch the rest of /big-dataset.txt/ and bring it into memory (pressing enter a few times to illustrate the animated progress bar you will see on your terminal):
$ vmtouch -vt big-dataset.txt
big-dataset.txt
[OOo oOOOOOOO] 6887/42116
[OOOOOOOOo oOOOOOOO] 10631/42116
[OOOOOOOOOOOOOOo oOOOOOOO] 15351/42116
[OOOOOOOOOOOOOOOOOOOOOo oOOOOOOO] 19719/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOo oOOOOOOO] 24183/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo oOOOOOOO] 28615/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo oOOOOOOO] 31415/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo oOOOOOOO] 36775/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo oOOOOOOO] 39431/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 42116/42116
Files: 1
Directories: 0
Touched Pages: 42116 (164M)
Elapsed: 12.107 seconds
Example 4
We have 3 big datasets, a.txt, b.txt, and c.txt but only 2 of them will fit in memory at once. If we have a.txt and b.txt in memory but would now like to work with b.txt and c.txt, we could just start loading up c.txt but then our system would evict pages from both a.txt (which we want) and b.txt (which we don't want).
So let's give the system a hint and evict a.txt from memory, making room for c.txt:
$ vmtouch -ve a.txt
Evicting a.txt
Files: 1
Directories: 0
Evicted Pages: 42116 (164M)
Elapsed: 0.076824 seconds
Example 5
Daemonise and lock all files in a directory into physical memory:
vmtouch -dl /var/www/htdocs/critical/
What other people are saying
People have found lots of uses for vmtouch over the years. Here are a few links in no particular order:
Articles
- FEATURED: Hosting Advice: Interview with Doug about vmtouch
- Admin magazine: Performance Tuning Dojo: Tune-Up
- Techniques for Warming Up a MongoDB Secondary
- Linux Memory Usage
- What a C programmer should know about memory
- Playlists at Spotify - Using Cassandra to store version controlled objects (slide 32)
- Understanding and optimizing Memory utilization
- admon.org
- thewebdev.de
- Tune Up Paging with vmtouch
- Linux Cached Memory
- Of how much of a file is in RAM
- Manipulating the kernel's page cache with vmtouch
- Memory management in Linux kernel (slide 16)
- Supercomputing on the cheap with Parallella
- System Design and Big Data, chapter 6
- Lucene @ Yelp (slide 16)
- tuxdiary: vmtouch: portable file cache analyzer
Real-world sightings
- Linux kernel mailing list: zcache: Support zero-filled pages more efficiently
- comp.db.sqlite.general: Strange eviction from Linux page cache
- Emacs speed up 1000%
- Jolla Review: Some Rough Edges, But This Linux Smartphone Shows Promise (vmtouch deployed on maemo phones?)
- ceph-users: Ceph SSD array with Intel DC S3500's
- proxmox forums: CPU Performance Degradtion
- Argonne National Laboratory's Advanced Photon Source
- Elastic Search: Dealing with OS page cache evictions?
- Data-center deploy using torrent and mlock()
- Making best use of 512mb Pi with tmpfs
- redis-db: Issue with Redis replication while transferring rdb file from master to slave
- mongodb-user: Oplog Memory Consumption
- CentOS bugtracker: oom killer kills process rather than freeing cache
- LMDB mailing list
- Used to optimize ethereum sports betting
Discussion about instagram's usage of vmtouch:
- What Powers Instagram: Hundreds of Instances, Dozens of Technologies
- Instagram Architecture: 14 Million Users, Terabytes Of Photos, 100s Of Instances, Dozens Of Technologies
- The Instagram Architecture Facebook Bought For A Cool Billion Dollars
- parse_vmtouch.py (script used by instagram)
Stack-overflow and friends
- Does the Linux filesystem cache files efficiently?
- Postgresql doesn't use memory for caching
- MongoDB, NUMA hardware, page faults
- Know programs in cache
- Is it possible to list the files that are cached?
- Tell the linux kernel to put a file in the disk cache?
- Securely wipe an entire Linux server with itself
- Caching/preloading files on Linux into RAM
- Why drop caches in Linux?
- Clear / Flush cached memory
- limit filesystem cache size for specific files under linux
- Memory mapping files for a blazing fast webserver on Linux
- Performance difference between ramfs and tmpfs
- How do I lock a growing directory in memory?
- How do I vmtouch a directory (not the files it contains)? (good question, I don't know of a userspace way to do this)
- MySQL queries are 10 to 100 times slower after OS reboot
- How can one examine what files are in Linux's page cache?
OS packages/ports
Non-english
Misc
Other tools
- linux-ftools
- cachemaster (inspired by vmtouch)
- pcstat
- fmlock
- nocache
- ureadahead
There are also lots of mentions on twitter using the #vmtouch hash-tag
Have another link? Please let me know!
Author
vmtouch is copyright (c) 2009-2017 Doug Hoyte and contributors.
Contributors are listed in CHANGES.