Unzip Multiple Zip Files with Ruby on Rails

Share It!

Sometimes you have to integrate exported data from third party desktop software. One way to do is to pack all data into a zipped file and push it via FTP onto your server. This tutorial will explain how to process zip-files with your Ruby on Rails™ application.

I will take an example from a real estate managing software. It’s a common way for this kind of software to pack and push their data to online real estate listing services.

The zip-file contains:

  • A xml file with property data
  • Some picture files

The zip files are transfered by real estate managing software into a specific folder. There can be multiple zip files whose have to processed in correct time order.

The tutorial will cover following points:

  • Processing multiple zip-files
  • Error handling an cleaning up

Processing multiple zip-files

All zip files will be uploaded into following directory. You have to create this directory with read/write access for your rails application. Furthermore you have to configure a FTP-User with write only access (“Postbox”-Setup). So you can use this ftp access for different clients.

1
 RAILS_ROOT/external_uploads

The file

1
 RAILS_ROOT/external_uploads/zipdata.zip

has following content:

1
2
3
4
5
6
data.xml
picture1.jpg
picture2.jpg
picture3.jpg
picture4.jpg
picture5.jpg

The complete import will be executed in model layer. So we need an Import class. Some ruby packages are required, too.

1
2
3
4
5
6
7
8
9
10
11
class Import < ActiveRecord::Base

require 'fileutils'
require 'zip/zip'
require 'zip/zipfilesystem'
require 'RMagick'
require 'find'

attr_accessor :current_source_file, :current_hash, :current_xml_file, :file_handler, :xml_document, :tmp_property, :expose, :tmp_attachment, :company, :current_is_topobjekt
....
end

Before we start we need some supporting class functions. This function returns a random string. We need it to create a temporary directory to extract the zip file:

1
2
3
4
5
6
7
# returns a random string
def self.make_random_string(len=10)
  chars = ("a".."z").to_a + ("A".."Z").to_a + ("0".."9").to_a
  random_string = ""
  1.upto(len) { |i| random_string << chars[rand(chars.size-1)] }
  return random_string
end

The next function returns a list of all zipfiles of your directory sortet by date. It’s important because the zip files depent from each other. 

1
2
3
4
5
# returns the sorted file list from oldest to latest timestamp
def self.sorted_filelist
  unsortet_files = Dir.glob(RAILS_ROOT+'/external_uploads/*.zip')
  unsortet_files.sort{|a,b| File.mtime(a) <=> File.mtime(b)}
end

This code loops throught our external_uploads directory and starts the process for each zipfile:

1
2
3
4
5
6
7
8
def self.go
  self.sorted_filelist.each do |single_zip_file|
  current_import = Import.new(
    :current_source_file => single_zip_file,
    :current_hash => make_random_string)
  current_import.start
  end
end

The object function start calls all sub processes once for each zipfile.

1
2
3
4
5
6
7
8
9
# processing the zipfile
def start
  make_tmp_path
  unzip
  open_xml_file
  parse_xml_file
  close_xml_file
  clean_up
end

The function make_tmp_path creates a temporary directory. Each zipfile must have an own temporary directory. With current_hash we can guarantee it.

1
2
3
4
5
6
7
def tmp_path
  File.join('external_uploads','tmp',current_hash)
end

def make_tmp_path
  FileUtils.mkdir_p tmp_path
end

The unzip function extracts the currently selected zip-file into temporary directory. The rescue block catches errors.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def add_path(filename)
  File.join(tmp_path,filename)
end

def unzip
  begin
    Zip::ZipFile.open(current_source_file).each do |single_file|
      if single_file.name.downcase =~ /.xml/
        self.current_xml_file = add_path(single_file.name)
      end
      single_file.extract(add_path(single_file.name))
    end
  rescue
    remove_tmp_path
  end
end

I use a regular expression to get the included XML-File. The file and the corresponding path will saved into current_xml_file attribute.

Now we have the files

1
2
3
4
5
6
data.xml
picture1.jpg
picture2.jpg
picture3.jpg
picture4.jpg
picture5.jpg

unzipped into directory

1
 RAILS_ROOT/external_uploads/tmp/adFvdSDed/

The current_xml_file attribute contains

1
 RAILS_ROOT/external_uploads/tmp/adFvdSDed/data.xml

With the “current_xml_file” string we can process the data.xml later.

Share It!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>