0

Help requested: best way to batch process images?

ColdFusion

A help request to anyone who has experience with heavy batch processing of images.  Below are the requirements, and based on these can anyone tell me if I'm doing the wrong thing by proposing that we use CF 8, with it's new cfimage tag and related image functions?

  1. get a recordset of potentially hundreds of thousands of URLs to images (mostly JPEGs and PNGs, with potentially some TIFFs and BMPs)
  2. iterate over those URLs, resize each image to "web-optimal" sizes, and save those to disk in an organized directory structure
  3. batch processing should not stop just because of 404's or other issues: in CF, obviously I could just do try/catch exception-handling to deal with such issues gracefully)
  4. log, or otherwise report, details of successes and failures during the batch process
  5. log, or otherwise report, fatal errors
  6. Windows 2003 compatible

I welcome any and all suggestions.  I don't know if there is some type of image processing software that can do this type of batch processing more efficiently than CF 8, or if we would get significant gains from using Java, .NET, perl or other languages directly?  (note that we have very limited experience with non-CF programming, so we'd only consider others if we'll see significant gains)

TIA for any help.  ;-} 

tags:
ColdFusion
 
Where are these images coming from? Are they User uploads or are you actually pulling them from other websites?

How many users do you expect? How beefy a server will you be using? Rather, how many servers will you have? Will you be clustering CF? Will certain servers be dedicated to only image processing?

How often will you be processing a high number of images? Will this be done "on the fly"?

I'm sure you could work something out with CF8, but you'd just have to make sure your code is optimized and your server has enough RAM and CPU to handle the number of images you'll be processing, especially at high traffic times. I'd imagine a process like this would could make good use of CFTHREAD.

(Sorry if this seems like rambling.)
 
posted 237 days ago
Add Comment Reply to: this comment OR this thread
 
Ben Nadel said:
 
Not sure if this is doable, but I have used FireWorks to batch process thousands of images. It only works on the local / network drive, not by URLs, but it's quite fast. You can't organize them as you go, but you can save over the original or to a new directory.

Not sure if this is possible in your scenario, but it has worked for me.
 
posted 237 days ago
Add Comment Reply to: this comment OR this thread
 
 
@Adrain - excellent questions, not rambling :)

- I'm pulling from other websites; the full URLs will be stored in a SQL Server 2000 dB for easy access

- This would a batch process, probably nightly during low site traffic. The process will be somewhat tied to a database, as the images are pulled from a RS initially and I may do some logging into the dB.

- The server will not initially be clustered, but will be dedicated to performing batch processes - it will not be the same server that our main websites run off of.

- The server will probably be a Win 2003, Quad proc, with 2-4 G RAM... we've yet to order it.

- If done programmatically, to reduce the chance of timeouts, I'd find a "sweet spot" for the number of images that CF (or other) could handle at a time, and then run the image processing process every so often during off-peak hours. For example, I might have it run 10,000 images every 30 minutes.

Obviously there will be lot's of testing involved, but please poke any additional holes you see in my approach. Thanks!
 
posted 237 days ago
Add Comment Reply to: this comment OR this thread
 
 
@Ben - thanks for the tip. I'll look at what FW has to offer.
 
posted 237 days ago
Add Comment Reply to: this comment OR this thread
 
 
If you're going to have a dedicated image server, then check out DeBabelizer. This was the program of choice for batch imaging when I worked at the UNT School of Visual Arts.

http://www.equilibrium.com
 
posted 237 days ago
Add Comment Reply to: this comment OR this thread
 
 
@Adrian,

Thanks again! Any idea on the price for the DeBabelizer Server? They force you to contact sales to get that info, and I don't want to waste our time if it's price-prohibitive.

Also, that's so cool that you went to UNT, too. I was a Music Ed. major for 3 1/2 years, switched degrees in the middle, and ended up with a BA in Sociology (lot of good that's doing me now ;-)
 
posted 237 days ago
Add Comment Reply to: this comment OR this thread
 
Nathan Strutz said:
 
Aaron,
I've made an app for this exact purpose. Story...

My previous job had a process like this in CF. They tried a couple CF-based image processors and settled on cfx_ImageFlare, as it was as fast and had better quality than other image processors (alagad, etc). There were a lot of images and the process took about 12 hours. Eventually, as the library grew, the process would overload the memory and crash the server nightly (it ran nightly because it pegged the CPU).

Somebody put a bug in my ear to redo this process in c#. I'm a CF guy but a true hacker at heart, so I spent an evening and came in the next day with a partial solution.

A few working days later gave me a finished product that ran in under 3 hours, took up to 25MB of memory and 7% CPU, written in .NET. With some threading, I'm sure I could have increased the CPU to 50% and cut the time to under an hour.

Also, the images looked great when they came out. the .NET image processing stuff (I think it's called GDI+) is top notch for basic manipulation (I was mostly cropping, scaling and adding borders).

My process was:
query database for list of X records
loop over X
hit a web service to get image records for each X
loop over X image records
download images into memory (not disk)
convert to a .NET native image
determine scaling and cropping factors
apply image actions
set output to jpeg, 70% quality
save file to disk
update database with local image record

Basically exactly what you're doing, I think.

It handled 404s really well, and I think it logged errors to the console, but you could do whatever.

It was a fun project, and something out-of-the-ordinary for me, so I loved it. I think I made almost all of it in c# express, which is a free download.

I only wonder if CF8 would be better. I have a hard time believing it would be, as .NET seems a better fit for behind-the-scenes utilities like this.
 
posted 237 days ago
Add Comment Reply to: this comment OR this thread
 
 
@Nathan,

Very interesting! That process *is* nearly identical to what I'm trying to accomplish. I like the idea of programming it (vs. a closed, proprietary piece of software) as I like the control.

Care to share the C# code, if only the non-proprietary parts of it? ;)

if so, I'd love to check it out (aqlong $%^_AT_$%^ gmail [[[dot]]] com)
 
posted 237 days ago
Add Comment Reply to: this comment OR this thread
 
Nathan Strutz said:
 
I can see if I still have a copy and see if there is anything I can share. I'd bet there's none of it I can actually give you, though. However, I took some of my experiences and popped out another somewhat related tool I made to get my photos up on dopefly - http://www.dopefly.com/techblog/entry.cfm?entry=21... - I can totally share this stuff, no worries. I'll mail you, we can chat.
 
posted 237 days ago
Add Comment Reply to: this comment OR this thread
 
Jim Priest said:
 
I did something like this one using the image tag from Efflare http://efflare.com/ on CF6. I was batch processing catalog images for a site redesign. It worked well and was fairly quick. I'd imagine the limit for you would be bandwidth depending on how big the images are... It seems like it would be fairly simple to hack something up in CF8 and give it a try. As you said if you do it in CF you can customize it to do EXACTLY what you want vs. using something off the shelf.
 
posted 237 days ago
Add Comment Reply to: this comment OR this thread
 
cfZen said:
 
As an update (I know you've been biting your nails hourly over this)... I was able to accomplish all of this using CF 8 Enterprise, and I'm quite pleased with the results!

At a very high level, we have a system that allows partners to post XML to our system; the system inserts various pieces of data (image URLs among them) into the database for processing later.

At regular intervals, a CF Scheduled Task comes through and iterates over all the image URLs (up to 2000 at a time), using CFThread to process multiple simultaneously for improved efficiency. For each image, the system pulls it off the URL in binary, resizes it into a medium and thumbnail size, updates the dB that it's been processed, and logs all successes and failures to a dB table. The only real slowness is due to how slow the host of images at each URL is, but the image resizing is only dozens of milliseconds each. While using 6 CFThreads (more than that and you risk putting too much load on the server that serves the image URLs), memory usage becomes too high to be safe at about 2000 images, so that's where I draw the line at the moment. Since we use the multi-server version, we can easily scale in the future by adding more CF instances for processing more images concurrently.

If anyone wants more info, let me know.
 
posted 119 days ago
Add Comment Reply to: this comment OR this thread
 

Search

Aaron  Longnion

Austin, TX