I tend to work on the road a lot. Recently I needed to upload a large file on a network with file transfer limits. Split to the rescue. When I mentioned this on Twitter, one of my friends asked how this worked, so I thought I would share it with everyone.
Split is a Unix command that, like the name sounds, splits files into multiple parts. It is essentially the opposite of the cat command and is often used in conjunction with cat. It has two primary uses – splitting a large data file into smaller samples for testing or sharing the workload and splitting a file into smaller pieces for ease of transfer. In addition to allowing you to work around file transfer size limits, splitting a file into smaller pieces can be an effective way to still transfer files when faced with a network that is going up and down. While using torrents or a download manager is probably an optimal solution to these problems, split provides a quick ad-hoc method. If you have to email a large file for some reason, it is still probably your best option.
While split was originally a Unix command, it is, naturally, also available on OS X. It is also available for Windows as part of many different packages that add Unix commands – Cygwin, mysysgit (from within gitsh), uwin, gnu tools, etc.
Split is easy to use and has a number of features to make it more useful. At it's simplest you can simply enter the command:
And split will create a set of 1000 line files named xaa through xzz. If you want to use more recognizable names for your output files, you can add it to the end of the command line. i.e.
split myfile name
will cause the files to be named nameaa through namezz. If you want longer suffixes to identify the different parts, you can add the -a # flag to set the length of the suffixes. And if you'd rather use numbers to identify the different files, you can add the -d flag.
To make split more useful, there are options to set the size of the files produced. If you have text data, you can split it by number of lines with -l #. If you want to split files by both lines and size, you can use the -C size flag, which splits on the line boundary that keeps the file smaller than the given size. For all data, you can just set a file size using -b size. This is the option I used to stay under the upload limits. The hardest part about split may be setting the file size. By default the size is in bytes (and in fact, you can also use the long tag –bytes=size.)
Split also takes a multitude of suffixes: kB for thousands of bytes (decimal kilobytes), K for 1024 bytes (actual kilobytes), MB for decimal Megabytes, M for real megabytes, GB for decimal gigabytes, and G for real gigabytes. (You can also specify terabytes (TB/T), petabytes, exabytes, etc but if your files are that large, I'll let you check the man page to figure it out.)