May 09 2010

Files, or rather too much files is the problem of the digital age. Hard drive space is very cheap now, and it is very easy to amass a whole bunch of information, much more than you ever need. I have a collection of several terabytes of assorted files, with no good method to organize the madness. (a terabyte is 1,000 gigabytes or 1 million megabytes). When I need a file it is a chore having to look for it all over the place. We need a better solution to this digital problem.

This problem has so many angles that I will take it step by step stating all the problems associated with having too much information in this digital age. This is not a problem unique to me since it the typical problem any family with computers, digital cameras, mp3 players, and video recorders will have. All these devices produce digital content and therefore a lot of files. The more digital devices you have the more content you have to sort out. This problem is also to be encountered in a small business with multiple emails, documents, powerpoint presentations, worksheets among other things. Every computer in the small business produces a lot of files and when you extrapolate this to the large enterprise which have computers in the hundreds or thousands the problem becomes colossal.

Returning to my problem first of all let us start with the files I have, and the types of files I need store. So to state my case let me start with my pictures.

Filetypes: ( .jpg, .gif, .raw, .dng, .tiff )

When I take a picture I will usually take three or four just in case one is not good enough. The days of being stingy with your 12 or 24 shot camera roll are over, and you can shoot till your heart’s content. With the availability of cheap 16Gb SD or cards you can take all the pictures you want, even at the highest resolution. You will probably run out of several camera batteries before you fill one card. Then you bring it home and sync it all to your computer and have 200 to 300 assorted pictures per event. The truth is that you will probably never sort them, since this takes a lot of time. Then you have the problem of deciding how to do it. You can sort the files by date, you can sort the files by event, or by place, or by person. If you have a fancy new cell phone with GPS you can even store them by global coordinate. The problem is that a hierarchy filesystem will only let you sort in one way. That is either you start naming your folders by date, by person, by event, or by place. You have to decide which method you are going to choose before you start to sort, and if halfway you decide another sorting method was better, then tough luck, you will have to start all over again.

Then you have the different formats, you can have lossy compression in JPG, or GIF which depending on the setting you choose can leave a a good quality to a mediocre one. Most cameras only store in the JPG format but when you get to higher end devices you have other options. For example you have the RAW format which is the purest form of your digital picture. This format stores the information exactly how it is recorded by the camera’s sensors. It really hard to call RAW a file format since every camera manufacturer implements RAW in its own incompatible way. Yo have to develop it (figuratively speaking) and process it before you can use this file format. Then you have the digital negative DNG format. This is the photographic industry’s attempt to fix the RAW format, it still is in its infancy but at least it is a standard. This has all the advantages of the RAW format that is to store the images in their purest form, plus it is supposedly compatible between manufactures. Finally there is the TIFF format, which is also a very high quality picture format used by computer scanners, and usually required by graphic designers in order to produce printer quality results. In my collection I have all 5 types of formats, and there isn’t a single computer program that can manage them all. (most of my problems are with incompatible RAW)

Then you have several versions or revisions of the same files. This happens because you did some photoshoping to fix the colors of the picture, or maybe just did a small crop to get a relative or two out of the way. So now you have two copies of the file. Since when you do destructive editing you never know if you want the original for another edit. So you keep the original also. Then you want it on your picture frame, or on your screen saver. You now generate size and format specific copies for those purposes. Now your picture file multiplied again, and now you got several different sizes and several different formats of the same file.  It seems your files keep multiplying like rabbits in spring, you need a method to keep track of all of this or pretty soon you have an unsearchable digital picture collection.

Any photographer reading this will probably say that there are programs that store this metadata I am talking about. Embedded in Windows, Adobe Photoshop Elements, Adobe Lightroom, Apple Aperture, Apple Iphoto, and online services like Flickr, Picasa, Shutteryfly, all have methods to store and keep track of some of the metadata. In the tests I have made all of them have severe  problems. First of all, even though all of them solve to some extent the tagging and search of all of the metadata in your pictures there are problems associated with this. In one of my earlier experiments with Photoshop Elements, I manually started tagging Places, Persons, Events on the pictures and it was okey, until the database of the program crashed. Then on my second attempt the metadata information was lost in an Photoshop Elements upgrade.  Then there is a question of vendor lock-in. Manually tagging the pictures is an extremely time consuming event, which in a perfect world you should only require to do once. But this is not a perfect world and there is currently no way to exchange the information if you for example would like to change from Adobe to Apple, so you are in a vendor locked in or you have to start tagging all over again. This is even made worse since vendors like Canon prefer to sync with the Canon software, and devices like the iPhone prefers to sync with Apple iPhoto. Not to say that they won’t sync with anything else but you will lose some of the metadata if you do not use the preferred program.

Now getting back to the software, they have made no provisions for multiple users, or networked computers. In my house, there are three people that generate pictures from different cameras, and camera phones, each with their own computer. There is no centralized way to do this, since the programs in question only work in standalone mode, and not in a networked mode. (I tried several experiments to trick the network mode but all ended up in corrupted databases) Therefore, the three other people that could help me tag all the pictures can not do it from the comfort of their own computer, and there is little incentive to do so, since only ONE computer would have the correct metadata and search capabilities. Also as your collection grows to like over 50Gb most of the software programs become increasingly slow. Online solutions fare no better, because first you will have to trust the online cloud. My pictures are one of the few digital items that are not replaceable, and trusting them to a third party is not exactly what I would consider idea. Then you have the privacy issue. I consider all my pictures private, unless I decide to publicly share them. With a third party you will never know who has access to your pictures. Finally, most of the online services tend to compress the pictures to save space, this is not good if I need the pictures in an archival quality. Overall online services do not seem to be an appropriate solution either.

So the current status of my picture collection is over 50Gb of  all messed up files scattered between several computers. In the semi centralized sever where I store most of my pictures I have some folders based on events and others based on persons which is a chaos in its own right, plus an increasingly unmanageable folder called Unsorted that is growing to exponential proportions. Since I already mentioned that my pictures collection is non-replaceable I need to backup it. This is a chore since I can only readily do this for the centralized location. All other users in my house know that if it is not in the server, it wont get on the backup. This will certainly cause a problem in the future when either I upgrade the machine, or a hard drive crashes. There must be a better way to solve this.

This is the first part of a series where I will detail all the problems of having too much information in the digital age. Please feel free to comment if you believe that I missed something.

