Upload and Parse a XML File with Ruby on Rails, Paperclip and Nokogiri

Share It!

In this tutorial I like to show you how to parse a uploaded XML file with Ruby on Rails. To show it on a example we use a GPX file from a Nokia Sports Tracker. Nokia closed the online Sports Tracker Service in 2010 but the GPX file we can still use.

The GPX file is a XML file containing GPS data. GPX files were created from various Nokia devices. With this file the users could share their outdoor sport activities with other users via the Nokia Sports Tracker Service.

The organisation of the GPX file is quite simple. The data is structured in tracks, segments and points. One track has many segments and each segment has many points.

You can download the demofile here.

In the process we are going to implement the user can create a new track and adding data by choosing a GPX file for upload. We parse the GPX file and add the data to our new track.

Lets start by creating a new rails application:

1
  rails new tutorial

For the new application we need some additional gems:

  • Paperclip for upload
  • Nokogiri for parsing
  • Twitter Bootstrap for a nicer output

We add these to our Gemfile:

1
2
3
4
5
6
7
8
9
10
11
12
group :assets do
  gem 'sass-rails',   '~> 3.2.3'
  gem 'coffee-rails', '~> 3.2.1'
  gem 'less-rails-bootstrap'
  gem 'therubyracer'
  gem 'twitter-bootstrap-rails'
  gem 'uglifier', '>= 1.0.3'
end

gem 'jquery-rails'
gem 'paperclip', '~> 3.0'
gem 'nokogiri'

After a bundle install we are ready to start with building our little application.

The development part is not a big deal. For the model structure we use a simplified 1:1 representation of the XML structure. We use the entities ‘track’, ‘tracksegment’ and ‘point’ with 1:n associations. One track has many tracksegments. One tracksegment has many points.

From all entities we need only the track as full REST resource. The other two entities we need only as models. At first we scaffolding the “Track”.

1
rails g scaffold Track name:string

Because we like to use paperclip for upload we need to insert add_attachment :tracks, :gps in the migration file:

1
2
3
4
5
6
7
8
9
10
class CreateTracks < ActiveRecord::Migration
  def change
    create_table :tracks do |t|
      t.string :name

      t.timestamps
    end
    add_attachment :tracks, :gpx
  end
end

add_attachment is a function from the paperclip gem. It adds all necessary attributes for paperclip to our model. The attibutes get the prefix ‘gpx’.

Next we create the model Tracksegment.

1
rails g model Tracksegment track:references

Besides the Track reference there are no further attributes. The XML file doesn’t contain any additional information so there is no need to add anything.

And the last model we create is the Point model. In this model we store the GPS information like latitude, longitude and elevation. The word elevation is used in the GPX XML file. In other GPS protocols the word altitude is used.

The name attribute represents a incrementing number. I don’t know why it’s called ‘name’ in the GPX file. For consitency I call it ‘name’ too. The description attribute is a short string and point_created_at contains the time the GPS coordinates where created.

1
2
rails g model Point tracksegment:references name:string latitude:float
longitude:float elevation:float description:string point_created_at:datetime

Before we continue with the migration we add the associations to our model files:

1
2
3
4
5
class Track < ActiveRecord::Base
  attr_accessible :name, :gpx  
  has_many :tracksegments, :dependent => :destroy
  has_many :points, :through => :tracksegments
end

I added has_many :points, :through => :tracksegments to get a simple access to the points of the track.

1
2
3
4
5
class Tracksegment < ActiveRecord::Base
  belongs_to :track
  has_many :points, :dependent => :destroy
  # attr_accessible :title, :body
end
1
2
3
4
class Point < ActiveRecord::Base
  belongs_to :tracksegment
  attr_accessible :description, :elevation, :latitude, :longitude, :name, :point_created_at
end

As last model modification we add has_attached_file :gpx to our track model. This line makes the paperclip upload available for this model.

1
2
3
4
5
class Track < ActiveRecord::Base
  ...
  has_attached_file :gpx
  ...
end

After the models are ready we can do

1
  rake db:migrate

and continue with some steps to make the output a little bit nicer. Like in the last tutorial we use twitter bootstrap. After we installed the gem we execute

1
2
  rails g bootstrap:install
  rails g bootstrap:themed tracks -f

to apply the twitter bootstrap changes to the tracks resource. After we add a div with container class to our views/layouts/application.html.erb file like this:

1
2
3
<div class="container">
  <%= yield %>
</div>

Our tracks list should now look like this:

tracks

If we press the new-button we open the form for creating a new track. We can see that there is no upload field yet but other attributes from paperclip. Since paperclip inserts this attributes automatically we change views/tracks/_form.html.erb file into this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<%= form_for @track, :html => { :class => 'form-horizontal' } do |f| %>
  <div class="control-group">
    <%= f.label :name, :class => 'control-label' %>
    <div class="controls">
      <%= f.text_field :name, :class => 'text_field' %>
    </div>
  </div>
  <div class="control-group">
    <%= f.label :gpx, :class => 'control-label' %>
    <div class="controls">
      <%= f.file_field :gpx %>
    </div>
  </div>
  <div class="form-actions">
    <%= f.submit nil, :class => 'btn btn-primary' %>
    <%= link_to t('.cancel', :default => t("helpers.links.cancel")),
                tracks_path, :class => 'btn' %>
  </div>
<% end %>

Now your upload form should look like this:

new

You can test the upload with any file. As soon as we are sure that the upload works correctly we can continue with parsing the XML file.

Parse the XML File

For parsing we use a before_save callback in our Track model. This callback executes the parser function every time we upload a file.

1
2
3
4
5
class Track < ActiveRecord::Base
  ...
  before_save :parse_file
  ...
end

The parse_file function looks like this:

1
2
3
4
5
  def parse_file
    tempfile = gpx.queued_for_write[:original]
    doc = Nokogiri::XML(tempfile)
    parse_xml(doc)
  end

In line 2 we get the uploaded file from paperclip. The queued_for_write function is very useful to do it. After we got the file we use Nokogiri to create an XML object in line 3. In line 4 we hand the doc over to our parse_xml function.

1
2
3
4
5
  def parse_xml(doc)
    doc.root.elements.each do |node|
      parse_tracks(node)
    end
  end

In the parse_xml function we iterate through all root elements. If we find one we hand it over to parse_tracks.

1
2
3
4
5
6
7
  def parse_tracks(node)
    if node.node_name.eql? 'trk'
      node.elements.each do |node|
        parse_track_segments(node)
      end
    end
  end

At first we check what element we found by check the node_name. If the node_name is ‘trk’ we step in and iterate through all elements of this ‘trk’ node. If we find one we hand it over to the parse_track_function.

1
2
3
4
5
6
7
8
9
  def parse_track_segments(node)
    if node.node_name.eql? 'trkseg'
      tmp_segment = Tracksegment.new
      node.elements.each do |node|
        parse_points(node,tmp_segment)
      end
      self.tracksegments << tmp_segment
    end
  end

Here we check the node_name in line 2. If the name is ‘trkseg’ we make a new Tracksegment object. After this we iterate through all existing node elements in lines 4 to 6. If we find an element we call the parse_points function with two arguments. At first we hand over the node and at second we hand over our new Tracksegment object. After all elements are parsed we add the Tracksegments to Track. But first lets look on the parse_points function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
  def parse_points(node,tmp_segment)
    if node.node_name.eql? 'trkpt'
      tmp_point = Point.new
      tmp_point.latitude = node.attr("lat")
      tmp_point.longitude = node.attr("lon")
      node.elements.each do |node|
        tmp_point.name = node.text.to_s if node.name.eql? 'name'
        tmp_point.elevation = node.text.to_s if node.name.eql? 'ele'
        tmp_point.description = node.text.to_s if node.name.eql? 'desc'
        tmp_point.point_created_at = node.text.to_s if node.name.eql? 'time'
      end
      tmp_segment.points << tmp_point
    end
  end

In this function we check at first for the node name. If the name is ‘trkpt’ we have entered a point structure of the XML. Now we create a new Point object in line 3. The latitude and longitude are attributes of the trkpt tag so we use the .attr() function from Nokogiri to read them out. This is done in lines 4 and 5.

In line 6 we start to go through all elements of the ‘trkpt’ node. For each element we check for node names and assign the node values to our Point attributes. After each node element is read out we add the Point object to the points of the Tracksegment.

After all Points, Tracksegments and Tracks are completed the parsing ends. Now the Track model is going to be saved.

To show our parsed points we change the views/tracks/show.html.erb by adding a Points table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
  <%- model_class = Track -%>
  <div class="page-header">
    <h1><%=t '.title', :default => model_class.model_name.human %></h1>
  </div>

  <dl class="dl-horizontal">
    <dt><strong><%= model_class.human_attribute_name(:name) %>:</strong></dt>
    <dd><%= @track.name %></dd>
    <dt><strong><%= model_class.human_attribute_name(:gpx_file_name) %>:</strong></dt>
    <dd><%= @track.gpx_file_name %></dd>
    <dt><strong><%= model_class.human_attribute_name(:gpx_content_type) %>:</strong></dt>
    <dd><%= @track.gpx_content_type %></dd>
    <dt><strong><%= model_class.human_attribute_name(:gpx_file_size) %>:</strong></dt>
    <dd><%= @track.gpx_file_size %></dd>
    <dt><strong><%= model_class.human_attribute_name(:gpx_updated_at) %>:</strong></dt>
    <dd><%= @track.gpx_updated_at %></dd>
  </dl>

  <div class="form-actions">
    <%= link_to t('.back', :default => t("helpers.links.back")),
                tracks_path, :class => 'btn'  %>
    <%= link_to t('.edit', :default => t("helpers.links.edit")),
                edit_track_path(@track), :class => 'btn' %>
    <%= link_to t('.destroy', :default => t("helpers.links.destroy")),
                track_path(@track),
                :method => 'delete',
                :data => { :confirm => t('.confirm', :default => t("helpers.links.confirm", :default => 'Are you sure?')) },
                :class => 'btn btn-danger' %>
  </div>

  <table class="table table-striped">
    <thead>
      <tr>
        <th>ID</th>
        <th>Point No.</th>
        <th>Latitude</th>
        <th>Longitude</th>
        <th>Elevation</th>
        <th>Description</th>
        <th>Time</th>
      </tr>
    </thead>
    <tbody>
      <% @track.points.each do |point| %>
        <tr>
          <td><%= point.id %></td>
          <td><%= point.name %></td>
          <td><%= point.latitude %></td>
          <td><%= point.longitude %></td>
          <td><%= point.elevation %></td>
          <td><%= point.description %></td>
          <td><%= point.point_created_at %></td>
        </tr>
      <% end %>
    </tbody>
  </table>

  <div class="form-actions">
    <%= link_to t('.back', :default => t("helpers.links.back")),
                tracks_path, :class => 'btn'  %>
    <%= link_to t('.edit', :default => t("helpers.links.edit")),
                edit_track_path(@track), :class => 'btn' %>
    <%= link_to t('.destroy', :default => t("helpers.links.destroy")),
                track_path(@track),
                :method => 'delete',
                :data => { :confirm => t('.confirm', :default => t("helpers.links.confirm", :default => 'Are you sure?')) },
                :class => 'btn btn-danger' %>
  </div>

In the Points-table we list all points of the Track. It should look like this:

show

I know, it’s a ugly way show the GPS data. In my next tutorial I’m going to show you how to display these GPS coordinates in a Google Map by using the Google Maps API.

Summary

In this tutorial we created a small Ruby on Rails application for uploading and parsing a XML file. As you have seen it’s not to difficult parse a XML data structure with Nokogiri. The process isn’t limited to XML files. You can parse other data structures, like CSV or QFX, too.

If you like this tutorial please share it in your social network. If you have suggestions for improvement please leave a comment.

Share It!

Comments

  1. Mahmoodi says

    hi
    i test this tutorial. but i found that has an error. tracksegment_id is not save in points table.
    please help me to correct that
    thanks

  2. Lars says

    Hi Mahmoodi, I tested the code and the tracksegment_id is saved properly in the points table. From my console: # Point id: 20984, name: “2623″, latitude: 59.7975, longitude: 5.25676, elevation: 109.5, description: “Speed 9.3 km/h Distance 16.62 km”, point_created_at: “2009-04-06 21:07:27″, tracksegment_id: 11, created_at: “2013-06-01 18:48:53″, updated_at: “2013-06-01 18:48:53″ If you upload your sourcecode to a service like github etc. I would take a look on it.

    • Lars says

      Hi Mahmoodi, the only difference I can find is that your “Tracksegment_id” in the migration starts with a uppercase “T”. In my sourcecode is “tracksegment_id”. The rails matcher could have a problem with it, but I’m not sure here.

  3. Hans Meyer says

    I really want to say thank you to Lars for this great tutorial! The tutorial is explained well and the steps are comprehensible! But i had the same problem like Mahmoodi! I had to change Track_id to track_id and Tracksegment_id to tracksegment_id. The problem is that in the tutorial the models references are created with upercase Letters! Maybe Lars you can change this! Thank you again!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>