Skip to main content

File Manipulation on Windows Servers from Linux

This is more of a handy tip than anything earth-shattering, but yesterday I was faced with the task of grabbing all files with a particular extension from a nested directory structure, moving them all into a single directory, and renaming them with a different extension. I also had to be careful to preserve the original timestamp of the file.


The files reside on a Windows server, and needless to say the thought of remoting into the Windows server and spending the afternoon drilling into nested directories, sorting by file type, and manually moving and renaming the files didn't appeal to me.


One of the great things about Linux is how powerful the shell is. Let me preface this with saying I'm not a DOS expert, so maybe there's a way to do this in DOS (or PowerShell, which I've never tried), but I knew I could probably accomplish this entire task in a couple of commands in a bash shell.


Step 1 was to mount the Windows server drive:


sudo mount -t cifs //server.dns.or.ip/sharename /mount/point -o user=username,password=password


Note that "/mount/point" is the local directory where you want to mount the share. I tend to use something like /media/servername-driveletter because mounting everything in /media is easy to remember, and on some distros this will also cause the drive to show up on your desktop (though this doesn't happen on Kubuntu).


With the drive mounted, I navigated to the top level of the (rather nasty) nested directory structure and ran the following:


find ./ -name "*.fileextension" | xargs -i mv {} /mount/point/destinationdirectory


What this does is traverses the directory structure, finds all the files with the file extension I needed to move, and pipes that into the move command. The "xargs -i mv {}" bit basically says "get your arguments for the command you're about to execute from the standard input (which is the list of file names kicked off by the find command) and replace {} with the data from standard input." Then of course /mount/point/destinationdirectory is the directory into which I want to move the files.


A note if you want to use copy (cp) instead of move (mv)--this does NOT retain the original timestamp. The cp command has a -p option that preserves the original timestamp, but this did not work for me when I was mapped to a Windows share. Apparently this is because I'm executing the command as one user on Linux and that user doesn't have permission to change the timestamp on the Windows side. If you were logged in with the same user name on both sides maybe this would work, but I didn't try it.


So with step one completed, I just needed to rename all the files with a new file extension, or in my case I was actually just removing a second file extension since the files were named in the format "filename.ext1.ext2" and I just wanted to remove the ".ext2" part.


After navigating to the directory into which I moved all my files, that was another one-liner in the terminal:


rename -v 's/.ext2$//' *.ext2


The rename command in bash allows for the renaming of multiple files using Perl regular expressions as the criteria for the rename operation. In this case I just wanted to lop off the .ext2 bit, and apply that to all files with the .ext2 extension. The -v option is for "verbose" so I could watch what it was doing while it did it, and if you're nervous about what might happen, you can use the -n option to have it show you what it would do with your command but not actually do it.


So a bit of research and help from a Linux guru friend, and the drudgery of file moving and renaming was reduced to two commands in a bash shell. With some clever piping I probably could have even done this in one line.


I suppose my point with all of this is when I'm faced with little tasks such as this one, I try to take the time (unless I asbolutely can't) to find a way to accomplish the task elegantly and in a way I can use again, as opposed to blindly saying "there goes the afternoon," shutting off my brain, and dragging files around in a GUI. Not only does this make me more productive, but I learn something in the process, and it's something I can use and alter time and again in the future to make boring tasks a lot less work.


Comments



There is a pretty easy way to do this in Windows using pure GUI (and there are ways to do this in DOS as well.)


If you're ever in a situation where you've got to use the Windows GUI, then just do a Windows "Search..." (right-click on folder, choose "Search...") and use the extension as the filter.


It will then return every match in that directory tree. You can then select all the files, right-click and select "Cut".


Last, just paste them into the directory you want.


I've used this technique to quickly clean up .tmp files from a folder and occasionally to remove all the .svn folders from a directory.





The danager of mounting samba/cifs shares with the username and the password on the commandline is that during the mount operation the process listing will show the mount command AND show the username and password just as its written on the CLI. Not a huge risk, but worth noting that on a multiuser system any other user that does a process listing can intentionally or accidentally read the credentials.





@Dan--thanks for the Windows tip. That covers the move part at least, and I assume there's some relatively easy file rename command in DOS that would keep you from having to do the file renaming file by file.


@Steven--good point; I guess I wouldn't want to be in a situation where someone I didn't trust had access to the system ;-), but that's definitely worth pointing out. If you're doing this from a Linux server that may have multiple people with access, you'll want to be aware of this.





FYI, the xargs command uses the environment variable space to communicate the file list from the find command. If the paths are long, or there are more than X files, xargs will blow up because it has exceeded the maximum allowed environment storage. A better choice is to use the -exec switch on your find command.


Here's an awesome reference of find goodness:


http://www.athabascau.ca/html/depts/compserv/webunit/HOWTO/find.htm


I guarantee there's a pile of useful stuff find can do that Windows GUI search cannot.





Thanks Jason--good info. So much Linux to learn, so little time. ;-)



Comments

Popular posts from this blog

Installing and Configuring NextPVR as a Replacement for Windows Media Center

If you follow me on Google+ you'll know I had a recent rant about Windows Media Center, which after running fine for about a year suddenly decided as of January 29 it was done downloading the program guide and by extension was therefore done recording any TV shows.

I'll spare you more ranting and simply say that none of the suggestions I got (which I appreciate!) worked, and rather than spending more time figuring out why, I decided to try something different.

NextPVR is an awesome free (as in beer, not as in freedom unfortunately ...) PVR application for Windows that with a little bit of tweaking handily replaced Windows Media Center. It can even download guide data, which is apparently something WMC no longer feels like doing.

Background I wound up going down this road in a rather circuitous way. My initial goal for the weekend project was to get Raspbmc running on one of my Raspberry Pis. The latest version of XBMC has PVR functionality so I was anxious to try that out as a …

Setting Up Django On a Raspberry Pi

This past weekend I finally got a chance to set up one of my two Raspberry Pis to use as a Django server so I thought I'd share the steps I went through both to save someone else attempting to do this some time as well as get any feedback in case there are different/better ways to do any of this.

I'm running this from my house (URL forthcoming once I get the real Django app finalized and put on the Raspberry Pi) using dyndns.org. I don't cover that aspect of things in this post but I'm happy to write that up as well if people are interested.

General Comments and Assumptions

Using latest Raspbian “wheezy” distro as of 1/19/2013 (http://www.raspberrypi.org/downloads)We’lll be using Nginx (http://nginx.org) as the web server/proxy and Gunicorn (http://gunicorn.org) as the WSGI serverI used http://www.apreche.net/complete-single-server-django-stack-tutorial/ heavily as I was creating this, so many thanks to the author of that tutorial. If you’re looking for more details on …

The Definitive Guide to CouchDB Authentication and Security

With a bold title like that I suppose I should clarify a bit. I finally got frustrated enough with all the disparate and seemingly incomplete information on this topic to want to gather everything I know about this topic into a single place, both so I have it for my own reference but also in the hopes that it will help others.Since CouchDB is just an HTTP resource and can be secured at that level along the same lines as you'd secure any HTTP resource, I should also point out that I will not be covering things like putting a proxy in front of CouchDB, using SSL with CouchDB, or anything along those lines. This post is strictly limited to how authentication and security work within CouchDB itself.CouchDB security is powerful and granular but frankly it's also a bit quirky and counterintuitive. What I'm outlining here is my understanding of all of this after taking several runs at it, reading everything I could find on the Internet (yes, the whole Internet!), and a great deal…