It was with some interest that I found news of an upcoming TortoiseSVN release this morning. In particular, one entry in the release notes made me very happy:

An option to disable the cache process completely and either fetch the status only for the currently visible folder or not show any overlays at all.

This means that finally, I will be reinstalling TortoiseSVN on my workstation. For an explanation of why this matters so much, I need to go into a little more background on my work environment and how the cache process works.

I used TortoiseCVS almost exclusively for 3 years while working on our ActiveX product written in VB6. During this time I grew to love the little tool, and even helped out the developers with bug reports and feature suggestions.

When we made the switch to SVN a little over 12 months ago I was elated to find a sister project that looked almost identical, although I later discovered that it was merely a branch from somewhere early in the life of TortoiseCVS and they had some notable differences in opinion.

The biggest one for me is the “cache process”. TortoiseCVS doesn’t use one, it simply loads the sandbox information as required. I guess the TortoiseSVN guys decided this was too slow and they decided to implement a cache – without thinking about the consequences. Not long after I installed TortoiseSVN, I noticed my system slowing to a crawl particularly when loading Eclipse. So I loaded up FileMon and had a
look at what was causing all the disk access.

I was horrified to discover that every time Eclipse accessed a file, the TortoiseSVN cache process looked for files in the .svn data subfolder of the file’s directory. I later confirmed that this is by design, and happens every time any file on the system is accessed. This is understandable from a technical standpoint, because any file change could be related to a sandbox on the system. In reality, it results in requesting files that don’t exist 90-95% of the time. Furthermore,
a few times this prevented me from removing a USB drive because the cache never releases its reference to the root folder of every drive.

For most people, the general system slowdown that results from such behaviour probably isn’t that bad. And for your average project using SVN, the benefits of a cache probably outweigh the extra disk activity. Not for me.

My Eclipse workspace is currently 265,422 files totalling almost 2.2gb (3gb on disk). This is 6 copies of the codebase (2 products, 3 version branches and 1 clean checkout for when I need a pristine sandbox) along with a few other products and various other side projects. This total includes all the SVN data, but due to what was probably a bug at the time the TSVN cache was looking for the .svn folder even inside .svn folders when Eclipse accessed a file there.

All this unnecessary disk activity prevented it from being a viable solution on our large projects, and I’m glad they’ve finally gone back to the TortoiseCVS method (even if it’s only an option). The downside to this new version is that it’s using SVN version 1.4, and the sandbox changes aren’t backwards compatible so I can’t install it until subclipse does their upgrade. I’m eagerly awaiting that day so I can finally use a decent SVN tool outside of Eclipse again! 😀