Think twice before you return a hash
Recently I got the chance to do an hour programming audition for the awesome people at hashrocket. I was excited, I brushed up on the basics the night before, readied myself with some herbal tea and logged in to the audition with enthusiasm.
What I found was 100 lines of difficult ruby code. I started with what I would normally do:
- Break big methods into small methods.
- See if any of those small methods give me a clue as to what responsibilities (classes) should be extracted.
- Look for switch statements or nested if/else statements and see if I can pull some kind of strategy out.
After about 20 minutes I had only refactored a couple of methods. I was on the spot and couldn’t figure out what was wrong. I kept gravitating towards the the big methods but didn’t know how to break it apart.
Here is the code.
module Domainatrix
class DomainParser
include Addressable
attr_reader :public_suffixes
def initialize(file_name)
@public_suffixes = {}
read_dat_file(file_name)
end
def read_dat_file(file_name)
# If we're in 1.9, make sure we're opening it in UTF-8
if RUBY_VERSION >= '1.9'
dat_file = File.open(file_name, "r:UTF-8")
else
dat_file = File.open(file_name)
end
dat_file.each_line do |line|
line = line.strip
unless (line =~ /\/\//) || line.empty?
parts = line.split(".").reverse
sub_hash = @public_suffixes
parts.each do |part|
sub_hash = (sub_hash[part] ||= {})
end
end
end
end
def parse(url)
return {} unless url && url.strip != ''
url = "http://#{url}" unless url[/:\/\//]
uri = URI.parse(url)
if uri.query
path = "#{uri.path}?#{uri.query}"
else
path = uri.path
end
if uri.host == 'localhost'
uri_hash = { :public_suffix => '', :domain => 'localhost', :subdomain => '' }
else
uri_hash = parse_domains_from_host(uri.host || uri.basename)
end
uri_hash.merge({
:scheme => uri.scheme,
:host => uri.host,
:path => path,
:url => url
})
end
def parse_domains_from_host(host)
return {} unless host
parts = host.split(".").reverse
public_suffix = []
domain = ""
subdomains = []
sub_hash = @public_suffixes
parts.each_with_index do |part, i|
sub_hash = sub_parts = sub_hash[part] || {}
if sub_parts.has_key? "*"
public_suffix << part
public_suffix << parts[i+1]
domain = parts[i+2]
subdomains = parts.slice(i+3, parts.size) || []
break
elsif sub_parts.empty? || !sub_parts.has_key?(parts[i+1])
public_suffix << part
domain = parts[i+1]
subdomains = parts.slice(i+2, parts.size) || []
break
else
public_suffix << part
end
end
{:public_suffix => public_suffix.reverse.join("."), :domain => domain, :subdomain => subdomains.reverse.join(".")}
end
end
end
So a lot is wrong with this code. The most fundamentally flawed logic in this code is not the fact that it has large methods, or a file parsing that obviously does not belong but the conscious decision to return hash’s from the methods.
When a hash is return from a method it is essentially an anonymous object. It has methods, and state, but no constructor and no class definition. Think about how much of this code would just vanish if an object was returned.
Since this file actually had other files that depended on this class returning a hash, we will have to imagine some code.
class ParsedDomain
attr_writer :public_suffix, :subdomain
attr_accessor :domain
def initilize uri
@uri = URI.parse uri
end
def public_suffix
@public_suffix || ""
end
def subdomain
@public_suffix || ""
end
def scheme
@uri.scheme
end
def host
@uri.host
end
def path
return"#{@uri.path}?#{@uri.query}" if @uri.query
@uri.path
end
def url
@uri.url
end
def localhost?
@uri.host == 'localhost'
end
end
(I could have done some cool meta programming here so that I didn’t have to repeat logic over and over again, but I wanted to be more explicit because this is an example.)
Now lets look at what our original parse method could look like.
def parse(url)
return {} unless url && url.strip != ''
url = "http://#{url}" unless url[/:\/\//]
domain = ParsedDomain.new url
return domain if domain.localhost?
parse_domain domain
end
def parse_domain domain_to_parse
#... imagine code here to calculate domain
domain_to_parse.domain = domain
domain_to_parse.subdomain = subdomain
domain_to_parse.public_suffix = public_suffix
domain_to_parse
end
Amazing how that would clean up. The moral of the story is, if you see a method, or write a method that returns a hash, you probably really wanted to return an object (with a class you defined). Take two steps back and re-evaluate what you are doing.
I really enjoyed getting out of my comfort zone and trying to fix some nasty ruby for a change. I forget that code like this exists sometimes, and it is nice to think about what makes code good for a change.
blog comments powered by Disqus