Skip to main content

Safer file copying in Python

  • Posted

Using scripting and the command line can be a two-edged sword. They’re very powerful tools, but they make it easy to shoot yourself in the foot. They assume that you know what you’re doing, even if it might be dangerous.

An example I often run in to is moving or copying files. If you try to copy over an existing file, these tools will often scribble over the old file without any warning. For example, here’s the description for shutil.copyfile:

shutil.copyfile(src, dst, *, follow_symlinks=True)

Copy the contents of the file named src to a file named dst and return dst. […] If dst already exists, it will be replaced.

The user is responsible for checking that the command won’t be destructive, not the utility.

Modern GUIs are a bit friendlier. For example, in the OS X Finder, if you try to copy over an existing file, you get this warning:

An item named “myfile.txt” already exists in this location. Do you want to replace it with the one you’re moving?

Keep Both / Stop / Replace

If you choose “Keep Both”, then the Finder leaves the original file intact, and picks a new name for your copy. (In this case, it would use myfile 2.txt, myfile 3.txt, and so on.) I use this option when I want to be cautious. When I’m working with something precious, like my photo collection, I’d rather create too many copies than risk accidentally deleting something.

I wanted to replicate that functionality in Python, as a drop-in for the copyfile() and move() methods of the shutil module. I couldn’t find existing code to do this, so I wrote my own script instead.

I came up with a few rules for my “safe”1 move/copy function:

I’ve uploaded my script as a Gist. It provides two functions: move() and copyfile(), which mimic the shutil equivalents and return the name of the actual destination.

Here’s how I import it:

try:
    from safeutil import move, copyfile
except ImportError:
    from shutil import move, copyfile

so if I try to run a script that imports this module on a system where it isn’t installed, things will still continue to work (albeit a little less safe).

I asked for advice on an early version on the Code Review Stack Exchange. My thanks to 200_success and Günther Noack for their useful feedback.

I’ve been using these functions in a number of different places. I’ll try to write up a few of them in the coming weeks.

  1. I use the word “safe” because this function is totally non-destructive. It’s also partially inspired by Donald Knuth, who coined the term “literate programming” to make people feel bad about writing non-literate, or “illiterate” programs. Likewise, why would I want to use an “unsafe” function when I have a safe alternative? ↩︎