Tuesday, August 12, 2008

File Manipulation on Windows Servers from Linux

This is more of a handy tip than anything earth-shattering, but yesterday I was faced with the task of grabbing all files with a particular extension from a nested directory structure, moving them all into a single directory, and renaming them with a different extension. I also had to be careful to preserve the original timestamp of the file.


The files reside on a Windows server, and needless to say the thought of remoting into the Windows server and spending the afternoon drilling into nested directories, sorting by file type, and manually moving and renaming the files didn't appeal to me.


One of the great things about Linux is how powerful the shell is. Let me preface this with saying I'm not a DOS expert, so maybe there's a way to do this in DOS (or PowerShell, which I've never tried), but I knew I could probably accomplish this entire task in a couple of commands in a bash shell.


Step 1 was to mount the Windows server drive:


sudo mount -t cifs //server.dns.or.ip/sharename /mount/point -o user=username,password=password


Note that "/mount/point" is the local directory where you want to mount the share. I tend to use something like /media/servername-driveletter because mounting everything in /media is easy to remember, and on some distros this will also cause the drive to show up on your desktop (though this doesn't happen on Kubuntu).


With the drive mounted, I navigated to the top level of the (rather nasty) nested directory structure and ran the following:


find ./ -name "*.fileextension" | xargs -i mv {} /mount/point/destinationdirectory


What this does is traverses the directory structure, finds all the files with the file extension I needed to move, and pipes that into the move command. The "xargs -i mv {}" bit basically says "get your arguments for the command you're about to execute from the standard input (which is the list of file names kicked off by the find command) and replace {} with the data from standard input." Then of course /mount/point/destinationdirectory is the directory into which I want to move the files.


A note if you want to use copy (cp) instead of move (mv)--this does NOT retain the original timestamp. The cp command has a -p option that preserves the original timestamp, but this did not work for me when I was mapped to a Windows share. Apparently this is because I'm executing the command as one user on Linux and that user doesn't have permission to change the timestamp on the Windows side. If you were logged in with the same user name on both sides maybe this would work, but I didn't try it.


So with step one completed, I just needed to rename all the files with a new file extension, or in my case I was actually just removing a second file extension since the files were named in the format "filename.ext1.ext2" and I just wanted to remove the ".ext2" part.


After navigating to the directory into which I moved all my files, that was another one-liner in the terminal:


rename -v 's/.ext2$//' *.ext2


The rename command in bash allows for the renaming of multiple files using Perl regular expressions as the criteria for the rename operation. In this case I just wanted to lop off the .ext2 bit, and apply that to all files with the .ext2 extension. The -v option is for "verbose" so I could watch what it was doing while it did it, and if you're nervous about what might happen, you can use the -n option to have it show you what it would do with your command but not actually do it.


So a bit of research and help from a Linux guru friend, and the drudgery of file moving and renaming was reduced to two commands in a bash shell. With some clever piping I probably could have even done this in one line.


I suppose my point with all of this is when I'm faced with little tasks such as this one, I try to take the time (unless I asbolutely can't) to find a way to accomplish the task elegantly and in a way I can use again, as opposed to blindly saying "there goes the afternoon," shutting off my brain, and dragging files around in a GUI. Not only does this make me more productive, but I learn something in the process, and it's something I can use and alter time and again in the future to make boring tasks a lot less work.


Comments



There is a pretty easy way to do this in Windows using pure GUI (and there are ways to do this in DOS as well.)


If you're ever in a situation where you've got to use the Windows GUI, then just do a Windows "Search..." (right-click on folder, choose "Search...") and use the extension as the filter.


It will then return every match in that directory tree. You can then select all the files, right-click and select "Cut".


Last, just paste them into the directory you want.


I've used this technique to quickly clean up .tmp files from a folder and occasionally to remove all the .svn folders from a directory.





The danager of mounting samba/cifs shares with the username and the password on the commandline is that during the mount operation the process listing will show the mount command AND show the username and password just as its written on the CLI. Not a huge risk, but worth noting that on a multiuser system any other user that does a process listing can intentionally or accidentally read the credentials.





@Dan--thanks for the Windows tip. That covers the move part at least, and I assume there's some relatively easy file rename command in DOS that would keep you from having to do the file renaming file by file.


@Steven--good point; I guess I wouldn't want to be in a situation where someone I didn't trust had access to the system ;-), but that's definitely worth pointing out. If you're doing this from a Linux server that may have multiple people with access, you'll want to be aware of this.





FYI, the xargs command uses the environment variable space to communicate the file list from the find command. If the paths are long, or there are more than X files, xargs will blow up because it has exceeded the maximum allowed environment storage. A better choice is to use the -exec switch on your find command.


Here's an awesome reference of find goodness:


http://www.athabascau.ca/html/depts/compserv/webunit/HOWTO/find.htm


I guarantee there's a pile of useful stuff find can do that Windows GUI search cannot.





Thanks Jason--good info. So much Linux to learn, so little time. ;-)



No comments: