2006-04-15

Ruby's hash implementation

Ruby's Hash class is implemented using the same engine that it uses for symbol table.

It is not meant for high-volume usage.
 
/tmp $ ruby testspeed.rb 
Rehearsal ----------------------------------------------------------------------------------------------
reading with while, name=1million                            1.240000   0.030000   1.270000 (  1.272282)
reading with readline, name=1million                         0.980000   0.210000   1.190000 (  1.215839)
reading with while and inserting into hash, name=1million    5.330000   0.200000   5.530000 (  5.715192)
reading with while and inserting into array, name=1million   5.640000   0.210000   5.850000 (  5.975710)
------------------------------------------------------------------------------------ total: 13.840000sec

                                                                 user     system      total        real
reading with while, name=1million                            1.750000   0.020000   1.770000 (  1.785138)
reading with readline, name=1million                         1.440000   0.010000   1.450000 (  1.454656)
reading with while and inserting into hash, name=1million    4.050000   0.020000   4.070000 (  4.102691)
reading with while and inserting into array, name=1million   2.290000   0.020000   2.310000 (  2.320377)
def create_file(name, size)
  File.open("/tmp/largefile_#{name}", "w") {|f| size.times {|i|f.puts "foo#{i}"; } }
end

# do these once
# create_file("1million", 1*1000*1000)
# create_file("5million", 5*1000*1000)

def read_with_while(name)
  File.open("/tmp/largefile_#{name}") {|fh|
    while line = fh.gets
      line.chomp!
    end
  }
end

def read_with_readlines(name)
  File.readlines("/tmp/largefile_#{name}")
end

def read_into_hash(name)
  hash={}; array=[]; File.open("/tmp/largefile_#{name}"){ |fh| while line = fh.gets; line.chomp!; 
                                                                 hash[line] = 1; 
                                                               end} 
end
def read_into_array(name)
  array=[]; File.open("/tmp/largefile_#{name}"){ |fh| while line = fh.gets; line.chomp!; 
                                                                 array << line
                                                               end} 
end

require 'benchmark'
Benchmark.bmbm {|r|
  ["1million"].each{|name|
    GC.start
    r.report("reading with while, name=#{name}") {read_with_while(name)}
    GC.start
    r.report("reading with readline, name=#{name}") {read_with_readlines(name)}
    GC.start
    r.report("reading with while and inserting into hash, name=#{name}") { read_into_hash(name)}
    GC.start
    r.report("reading with while and inserting into array, name=#{name}") { read_into_array(name)}
  }
} 
 
(originally from http://microjet.ath.cx/WebWiki/2006.04.15_Ruby%27sHash.html)