Unzip Multiple Zip Files with Ruby on Rails

Sometimes you have to integrate exported data from third party desktop software. One way to do is to pack all data into a zipped file and push it via FTP onto your server. This tutorial will explain how to process zip-files with your Ruby on Rails™ application.

I will take an example from a real estate managing software. It’s a common way for this kind of software to pack and push their data to online real estate listing services.

The zip-file contains:

  • A xml file with property data
  • Some picture files

The zip files are transfered by real estate managing software into a specific folder. There can be multiple zip files whose have to processed in correct time order.

The tutorial will cover following points:

  • Processing multiple zip-files
  • Error handling an cleaning up

Processing multiple zip-files

All zip files will be uploaded into following directory. You have to create this directory with read/write access for your rails application. Furthermore you have to configure a FTP-User with write only access (“Postbox”-Setup). So you can use this ftp access for different clients.

1
 RAILS_ROOT/external_uploads

The file

1
 RAILS_ROOT/external_uploads/zipdata.zip

has following content:

1
2
3
4
5
6
data.xml
picture1.jpg
picture2.jpg
picture3.jpg
picture4.jpg
picture5.jpg

The complete import will be executed in model layer. So we need an Import class. Some ruby packages are required, too.

1
2
3
4
5
6
7
8
9
10
11
class Import < ActiveRecord::Base

require 'fileutils'
require 'zip/zip'
require 'zip/zipfilesystem'
require 'RMagick'
require 'find'

attr_accessor :current_source_file, :current_hash, :current_xml_file, :file_handler, :xml_document, :tmp_property, :expose, :tmp_attachment, :company, :current_is_topobjekt
....
end

Before we start we need some supporting class functions. This function returns a random string. We need it to create a temporary directory to extract the zip file:

1
2
3
4
5
6
7
# returns a random string
def self.make_random_string(len=10)
  chars = ("a".."z").to_a + ("A".."Z").to_a + ("0".."9").to_a
  random_string = ""
  1.upto(len) { |i| random_string << chars[rand(chars.size-1)] }
  return random_string
end

The next function returns a list of all zipfiles of your directory sortet by date. It’s important because the zip files depent from each other. 

1
2
3
4
5
# returns the sorted file list from oldest to latest timestamp
def self.sorted_filelist
  unsortet_files = Dir.glob(RAILS_ROOT+'/external_uploads/*.zip')
  unsortet_files.sort{|a,b| File.mtime(a) <=> File.mtime(b)}
end

This code loops throught our external_uploads directory and starts the process for each zipfile:

1
2
3
4
5
6
7
8
def self.go
  self.sorted_filelist.each do |single_zip_file|
  current_import = Import.new(
    :current_source_file => single_zip_file,
    :current_hash => make_random_string)
  current_import.start
  end
end

The object function start calls all sub processes once for each zipfile.

1
2
3
4
5
6
7
8
9
# processing the zipfile
def start
  make_tmp_path
  unzip
  open_xml_file
  parse_xml_file
  close_xml_file
  clean_up
end

The function make_tmp_path creates a temporary directory. Each zipfile must have an own temporary directory. With current_hash we can guarantee it.

1
2
3
4
5
6
7
def tmp_path
  File.join('external_uploads','tmp',current_hash)
end

def make_tmp_path
  FileUtils.mkdir_p tmp_path
end

The unzip function extracts the currently selected zip-file into temporary directory. The rescue block catches errors.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def add_path(filename)
  File.join(tmp_path,filename)
end

def unzip
  begin
    Zip::ZipFile.open(current_source_file).each do |single_file|
      if single_file.name.downcase =~ /.xml/
        self.current_xml_file = add_path(single_file.name)
      end
      single_file.extract(add_path(single_file.name))
    end
  rescue
    remove_tmp_path
  end
end

I use a regular expression to get the included XML-File. The file and the corresponding path will saved into current_xml_file attribute.

Now we have the files

1
2
3
4
5
6
data.xml
picture1.jpg
picture2.jpg
picture3.jpg
picture4.jpg
picture5.jpg

unzipped into directory

1
 RAILS_ROOT/external_uploads/tmp/adFvdSDed/

The current_xml_file attribute contains

1
 RAILS_ROOT/external_uploads/tmp/adFvdSDed/data.xml

With the “current_xml_file” string we can process the data.xml later.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>