23 August, 2016

The Pure Function As An Object (PFAAO) Pattern

??? words · ??? min read

In Ruby, all functions are methods, so the terms “function” and “method” are interchangeable in this article.

In this article, I want to demonstrate a nice way to write functional-style code in Ruby. It is a way to write non-trivial pure functions, without a bunch of weird non-idiomatic code.

I’ll start by defining what pure functions are, explaining their benefits, and then walk through an example implementation.

Pure Functions

Pure functions are the core concept of functional programming (FP). For a function to be “pure” it must have two properties:

No side effects
Referential transparency

Side effects are any observable changes caused by calling a function – that is, anything other than generating a return value. For more details, see the previous article: Isolate Side Effects – Functional Style in Ruby.

Referential transparency is a functional programming term to describe a function whose return value is completely determined by its arguments. Given the same arguments, a referentially transparent function will always have the same return value. Addition, for example, is referentially transparent because add(2, 5) is always 7. The rand function is not referentially transparent because the return value varies.

Benefits Of Purity

I don’t especially like the term “pure” because the word has a moralistic tone. In the context of programming, pure does not mean good, and impure does not mean bad. It’s just a way to categorise and think about functions.

Pure functions give you two guarantees that make your job, as a programmer, easier.

The first guarantee is that, when you call a pure function, it won’t break anything. If a function is free from side effects, it can not possibly affect any other part of your app. This means you no longer have to ask yourself “what bad things could happen if I call this function?” All you have to think about is the return value.

The second guarantee is that the return value is completely predictable. If you provide the correct arguments, you will always get the correct return value. The result is 100% within your control, which makes the code easy to read and write, and easy to debug.

In addition, these two guarantees mean that return values from pure functions are cacheable. If you have a pure function that is relatively performance-intensive, it’s easy to slap a cache in front of it where the keys are just the function arguments. This caching of return values is called memoization.

The Example: A JSON To XML Converter

To demonstrate this pattern, I’m going to write a program that converts a JSON document to XML. The whole example project is available on GitHub, here: https://github.com/tomdalling/pure-function-as-an-object

The input looks like this:

{ "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA", "_id" : "01001" }
{ "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA", "_id" : "01002" }
{ "city" : "BARRE", "loc" : [ -72.10835400000001, 42.409698 ], "pop" : 4546, "state" : "MA", "_id" : "01005" }
{ "city" : "BELCHERTOWN", "loc" : [ -72.41095300000001, 42.275103 ], "pop" : 10579, "state" : "MA", "_id" : "01007" }
{ "city" : "BLANDFORD", "loc" : [ -72.936114, 42.182949 ], "pop" : 1240, "state" : "MA", "_id" : "01008" }

And the output looks like this:

<?xml version="1.0"?>
<cities count="5">
  <city id="01001" name="AGAWAM" state="MA" location="-72.622739,42.070206" population="15338"/>
  <city id="01002" name="CUSHMAN" state="MA" location="-72.51565,42.377017" population="36963"/>
  <city id="01005" name="BARRE" state="MA" location="-72.108354,42.409698" population="4546"/>
  <city id="01007" name="BELCHERTOWN" state="MA" location="-72.410953,42.275103" population="10579"/>
  <city id="01008" name="BLANDFORD" state="MA" location="-72.936114,42.182949" population="1240"/>
</cities>

We will only be using the Ruby 2.3 standard library, and the Ox gem for writing the XML.

Let’s get started!

Designing The API

After gathering the requirements, the next step is to ask: how much of this can be implemented with pure functions?

As it turns out, pretty much the entire thing could be a pure function. Reading the input is non-pure, and writing the output is also non-pure, but the entire conversion from JSON to XML is pure. The same input JSON always produces the same output XML.

The API for this converter could be used like this:

input = File.read('input.json')
output = JSON2XML.convert(input)
File.write('output.xml', output)

This is an example of the “write the usage code first” approach to API design, which I believe is the “driven design” aspect of TDD. For another viewpoint on this approach to API design, I recommend the talk Designing and Evaluating Reusable Components by Casey Muratori.

This example code could easily be written as a test. Just use some dummy input, and assert that output is correct.

Because the File.read and File.write calls are non-pure, I have purposely kept them separate from the conversion code. That allows the conversion step to be implemented as a pure function – it just takes a string and returns a string.

This API looks pretty good to me, assuming that the input data isn’t too big. If the input data set was huge, we wouldn’t be able to load the whole thing into memory, so we would need so write some sort of streaming API. But, for the sake of demonstration, let’s assume that we’ve investigated this possibility and decided that it’s very unlikely that the input will ever be more than a few megabytes.

The Motivation

The public API consists of a single method, which will be the pure function. Let’s start with that:

require 'json'
require 'ox'

module JSON2XML
  def self.convert(input_json)
    # TODO: implementation goes here
  end
end

We could easily implement all the functionality by writing a bunch of pure methods on the JSON2XML module, without defining any new classes. It would work, but I personally don’t think that this is how Ruby is designed to be used. It’s not what most people would consider to be idiomatic Ruby. We would be fighting the language – going against the grain. That’s not fun.

To use Ruby the way it is designed to be used, we will need to define a class here. However, the class is unnecessary. It’s an implementation detail. The public API is just a single pure function, so nobody should ever need to instantiate an object of the class we’re about to create.

This is the motivation behind the pattern: we want the public API to be a simple pure function, but we want to implement that function with a private class.

The Pattern

Without further ado, here is the pattern:

require 'json'
require 'ox'

class JSON2XML
  def self.convert(input_json)
    new(input_json).send(:xml)
  end

  private
    def initialize(input_json)
      @input_json = input_json
    end

    def xml
      # TODO: implementation goes here
    end
end

Instead of a module, JSON2XML is now a class. Everything below the convert method is marked as private, signalling to other developers that convert is the only thing they should be calling, and everything else is an implementation detail. The xml method can’t be called directly because it is private, so we use send(:xml) to get around that.

The convert class method:

creates an object of its own class,
passing its argument into the initializer,
and calls a single method on the object, to generate the return value.

That’s the overview of how the pattern works.

Implementation Advantages

Now let’s finish off the implementation of the JSON to XML converter, to demonstrate some advantages of using this pattern.

The xml method is empty at the moment. It’s supposed to return the output XML as a string, as required by the convert class method. How do you generate XML? One approach is to take some sort of XML “document” object, and serialize it. Let’s do that:

def xml
  Ox.dump(document, with_xml: true)
end

“But wait!” I hear you say. “There is no document object. It doesn’t exist.” Let’s create it, then.

How do you make a document object? Since we’re using the Ox gem, we need to instantiate an Ox::Document and fill it with all the output. XML documents are only supposed to have a single root node in them, so all the content will have to be inside that.

def document
  Ox::Document.new(version: '1.0').tap do |doc|
    doc << root_node
  end
end

“But wait!” I hear you say, again. “Where did this root_node come from? That’s not defined anywhere.” Let’s create it, then.

How do you make the root node? In Ox, you instantiate a Ox::Element object. We have to fill the root node with all the output data, so each city will have its own node inside the root node. Also, the root node has a “count” attribute on it, for reasons that I will explain later.

def root_node
  Ox::Element.new('cities').tap do |root|
    root[:count] = city_nodes.size
    city_nodes.each { |city| root << city }
  end
end

“But wait!” I hear you say, for a third time. “city_nodes doesn’t exist either!” Let’s create it. I’m sure you’re getting the gist of what’s happening here.

Each city node is also an Ox::Element object. We can create a city node from each line of the JSON input.

def city_nodes
  input_json.each_line.map { |line| parse_city_node(line) }
end

The input_json method is missing. That is the string value that was passed into the initializer. We can implement that with a simple attr_reader :input_json.

The parse_city_node method needs to be implemented, too. Here it is:

def parse_city_node(line)
  Ox::Element.new('city').tap do |city|
    attrs = JSON.parse(line)

    city[:id] = attrs.fetch('_id')
    city[:name] = attrs.fetch('city')
    city[:state] = attrs.fetch('state')
    city[:location] = attrs.fetch('loc').join(',')
    city[:population] = attrs.fetch('pop').to_s
  end
end

“But wait!” I don’t hear you say, because the implementation is now complete. Here is the whole class:

class JSON2XML
  def self.convert(input_json)
    new(input_json).send(:xml)
  end

  private
    attr_reader :input_json

    def initialize(input_json)
      @input_json = input_json
    end

    def xml
      Ox.dump(document, with_xml: true)
    end

    def document
      Ox::Document.new(version: '1.0').tap do |doc|
        doc << root_node
      end
    end

    def root_node
      Ox::Element.new('cities').tap do |root|
        root[:count] = city_nodes.size
        city_nodes.each { |city| root << city }
      end
    end

    def city_nodes
      input_json.each_line.map { |line| parse_city_node(line) }
    end

    def parse_city_node(line)
      Ox::Element.new('city').tap do |city|
        attrs = JSON.parse(line)

        city[:id] = attrs.fetch('_id')
        city[:name] = attrs.fetch('city')
        city[:state] = attrs.fetch('state')
        city[:location] = attrs.fetch('loc').join(',')
        city[:population] = attrs.fetch('pop').to_s
      end
    end
end

What I’ve just demonstrated is a top-down decomposition of the entire implementation. You start with the desired output – in this case, an XML string – and work backwards. You write the code that you wish existed, and implement it later. This is a nice way to break down complicated algorithms into smaller, and smaller pieces.

This is all made possible by Ruby’s syntax. Ruby blurs the line between local variables and a method calls. This allows us to write identifiers representing values that we wish we had, as if they already exist as local variables, and then implement them later as methods. We’re working with grain of the language, and this is one of the benefits.

Notice how none of the methods have side effects. Each method in the class is itself a pure function. The @input_json instance variable is never changed – it’s essentially a constant. Because of that, we would expect that every method would always have the same return value. For example, if you call the xml method three times, you would get three identical strings.

Performance Optimisation

Astute readers may have noticed a performance hiccup in the root_node method.

def root_node
  Ox::Element.new('cities').tap do |root|
    root[:count] = city_nodes.size
    city_nodes.each { |city| root << city }
  end
end

I put the “count” attribute on the root node to demonstrate a common performance problem with this pattern.

The problem is that city_nodes is called two times. That means we’re parsing the entire data set twice. That is unnecessary, and roughly doubles the run time of the conversion.

Thankfully, due to all the methods being pure, we have a simple solution to this problem: memoization. We’ve already seen that the city_nodes method always returns the same value for a given object. That’s because it is referentially transparent, and it takes no arguments.

Cache invalidation can be a tricky problem, but not in this case. When do we need to invalidate the cache? Never! It’s a pure function. The return value literally never changes. We can just forget about cache invalidation completely.

Here is the solution:

def city_nodes
  @city_nodes ||=
    input_json.each_line.map { |line| parse_city_node(line) }
end

The function implementation is exactly the same, except for the @city_nodes ||= line that has been added. This trick isn’t specific to the PFAAO pattern, it’s just normal, idiomatic Ruby.

Now we can call city_nodes as many times as we like, from any other method, without having to worry about performance. This is a cleaner solution than storing the return value in a local variable before using it.

When To Use PFAAO

This pattern is good for implementing complicated pure functions. I’ve written a DOCX to HTML converter this way, quite happily.

If the implementation is simple, it’s not really worth defining a new class – just write a slightly larger function. If you’re writing a pure function and it’s growing out of control, that is the time to consider using this pattern.

Where the implementation is mostly pure, but not completely, you can still use this pattern. The implementation will be more complicated, but still have a small and simple public API. Ruby isn’t Haskell. We can use side effects, in a controlled fashion, wherever it makes sense.

If your implementation requires a lot of mutable state, PFAAO is probably a bad fit.

Optional Extras

Disallow instantiation of the class.
Unfortunately, making initialize private does not prevent instantiation of the class. The fact that everything is private except the class method should indicate that the class isn’t meant to be instantiated. If that’s not strict enough for you, you can make Ruby raise an error by adding this line of code to the class definition:
```
private_class_method :new
```
Make it quack like a Proc.
Objects that act like functions, such as Proc objects, are invoked using the call method in idiomatic Ruby. I named the class method convert in the example above, but if you changed that to call you could use the class as if it were a Proc. In that case, I would change the class name to XML2JSONConverter to keep the word “convert” in there.
Make it convertable to a Proc.
Let’s say you’re using this PFAAO as a block fiarly often, like this:
```
documents.map{ |d| XML2JSON.convert(d) }
```
It would be nicer to just pass the block argument using ampersand syntax like this:
```
documents.map(&XML2JSON)
```
To make this a reality, implement the to_proc method on the class, like this:
```
def self.to_proc
  method(:convert).to_proc
end
```

Conclusion

It is possible, and even preferable, to write Ruby in a functional style – no monads or category theory required. Complicated pure functions can be written in idiomatic Ruby.

Again, you can get the code for the article from GitHub: https://github.com/tomdalling/pure-function-as-an-object

Got questions? Comments? Milk?

Shoot an email to [email protected] or hit me up on Twitter (@tom_dalling).