Object#tap And How To Use It
The #tap
method was introduced in Ruby 1.9, back in 2007. To this
day, it raises questions like:
- What does it do?
- How is that even remotely useful, and when would I ever use it?
- Why is it named so terribly?
After reading this article, you will be able to answer these questions.
What Does It Do?
The #tap
method does two things:
- It calls the given block, passing
self
as the only argument - It always returns
self
return_value =
42.tap do |argument|
puts "argument: #{argument}"
end
puts "return value: #{return_value}"
# Outputs:
#
# argument: 42
# return value: 42
The implementation is only two lines.
class Kernel
def tap
yield(self)
self
end
end
How Is That Even Remotely Useful?
Let’s start by reading the official documentation.
.tap
(from ruby core)
Implementation from Object
------------------------------------------------------------------------
obj.tap {|x| block } -> obj
------------------------------------------------------------------------
Yields self to the block, and then returns self. The primary purpose of
this method is to "tap into" a method chain, in order to perform
operations on intermediate results within the chain.
(1..10) .tap {|x| puts "original: #{x}" }
.to_a .tap {|x| puts "array: #{x}" }
.select {|x| x.even? } .tap {|x| puts "evens: #{x}" }
.map {|x| x*x } .tap {|x| puts "squares: #{x}" }
So the original intention of this method was to perform “operations” (a euphemism for side effects) on “intermediate results” (return values from methods in the middle of the chain).
Use Case: Debugging Method Chains
As a concrete example of the intended purpose, let’s say we’re debugging a big method chain, and our first thought is WTF is this thing even doing?
def most_frequent_words(text)
text
.split(/(\s|[\[\]()])+/)
.map(&:downcase)
.select { _1.match?(/[a-z]/) }
.reject { _1.match?(/[a-z0-9]{3}\.md/) }
.map { _1.tr('’“”', "'\"\"") }
.map { strip_regex(_1, /[.,?:"_*~()\[\]]+/) }
.reject { COMMON_WORDS.include?(_1) }
.select { _1.length >= 2 }
.tally
.sort_by(&:last)
.last(30)
.reverse
.to_h
end
Puts debuggerers might try to understand it by
printing out some of the return values in the middle of this chain.
If we were unaware that #tap
existed, we might do that like so:
def most_frequent_words(text)
split_parts = text.split(/(\s|[\[\]()])+/)
puts "split_parts: #{split_parts.inspect}"
before_tally =
split_parts
.map(&:downcase)
.select { _1.match?(/[a-z]/) }
.reject { _1.match?(/[a-z0-9]{3}\.md/) }
.map { _1.tr('’“”', "'\"\"") }
.map { strip_regex(_1, /[.,?:"_*~()\[\]]+/) }
.reject { COMMON_WORDS.include?(_1) }
.select { _1.length >= 2 }
puts "before_tally: #{before_tally.inspect}"
before_tally
.tally
.sort_by(&:last)
.last(30)
.reverse
.to_h
end
There are two new variables with meaningless names, we have to reformat a bunch of stuff, and then we have to remove it all again once we’re done debugging. That’s too much effort.
Using #tap
, the same thing can be achieved by adding just two lines
of code:
def most_frequent_words(text)
text
.split(/(\s|[\[\]()])+/)
.tap { puts "parts: #{_1.inspect}" } # <-----------------
.map(&:downcase)
.select { _1.match?(/[a-z]/) }
.reject { _1.match?(/[a-z0-9]{3}\.md/) }
.map { _1.tr('’“”', "'\"\"") }
.map { strip_regex(_1, /[.,?:"_*~()\[\]]+/) }
.reject { COMMON_WORDS.include?(_1) }
.select { _1.length >= 2 }
.tap { puts "before tally: #{_1.inspect}" } # <-----------------
.tally
.sort_by(&:last)
.last(30)
.reverse
.to_h
end
The .tap { ... }
lines are easier to write, easier to move around,
and easier to delete.
Use Case: Building And Returning An Object
Before #tap
existed, ActiveSupport already had a similar method
which they called #returning
. This may seem like a strange name, but
that is because it was designed for a use case that is unrelated to
method chains: modifying and then returning an object.
The #returning
method was removed from ActiveSupport some time ago.
The #returning
method is designed for the common situation where we
fetch or create an object, which we want to eventually
return, but that needs to be modified or configured first.
As a real-world example, let’s look at a method that creates a
OpenSSL::Cipher::AES
object for encrypting some data. A simple
procedural approach might look like this:
def cipher
aes = OpenSSL::Cipher::AES.new(128, :CBC)
aes.encrypt # put the object into encryption mode
aes.key = @key
aes.iv = @iv
aes
end
First the object is created, then it is mutated, and then returned
from the method. It’s not immediately obvious that this method returns
the aes
object, until we read the final line.
Using ActiveSupport’s #returning
, it would look like this:
def cipher
returning OpenSSL::Cipher::AES.new(128, :CBC) do |aes|
aes.encrypt # put the object into encryption mode
aes.key = @key
aes.iv = @iv
end
end
This makes the intent of the method more clear. The first line
indicates that the return value will be this new
OpenSSL::Cipher::AES
object, and the lines inside the block are to
set up or configure the object.
The same thing can be written using #tap
, although I don’t think it
reads quite as nicely.
def cipher
OpenSSL::Cipher::AES.new(128, :CBC).tap do |aes|
aes.encrypt # put the object into encryption mode
aes.key = @key
aes.iv = @iv
end
end
So, this second use case for #tap
is when we are writing a build
and return kind of method, and we want to communicate that a little
more clearly. Seeing Something.new.tap do |something|
as the first
line acts as a shortcut to understanding the purpose of the method.
Bonus Use Case: Placating Robotic Police Officers
Early in this article I mentioned the purpose of #tap
is to cause
side effects. The method chaining use case shows the side effect of
printing output using puts
. The other use case shows mutating an
object, which is also a kind of side effect. Let’s look at a
combination of the two: mutating an object within a method chain.
If we were using Ruby 2.7 or later, Enumerable#filter_map
would be a
better choice here, but let’s say we’re using 2.6 and it’s not
available.
Let’s say we want to get a list of blog post publish dates, formatted
as ISO8601 strings, excluding unpublished posts where published_at
is nil
. We might write something like this:
blog_posts
.map(&:published_at)
.compact
.map(&:iso8601)
This works, but our favourite robotic police officer might complain about it.
Offenses:
code.rb:6:3: C: Performance/ChainArrayAllocation: Use unchained map and compact! (followed by return array if required) instead of chaining map...compact.
.compact
^^^^^^^^
code.rb:7:3: C: Performance/ChainArrayAllocation: Use unchained compact and map! (followed by return array if required) instead of chaining compact...map.
.map(&:iso8601)
^^^^^^^^^^^^^^^
After thanking whoever set up our CI pipeline, we might use #map!
and #compact!
like we’re being told to. These methods mutate the
existing array, instead of creating and returning a new one.
blog_posts
.map(&:published_at)
.compact!
.map!(&:iso8601)
But now the tests fail.
undefined method `map!' for nil:NilClass (NoMethodError)
Unlike #compact
, which always returns an Enumerable
, the
#compact!
method will return nil
sometimes. Not always, just
sometimes, to keep us on our toes.
To make it work reliably, we are forced to write procedural-style code like this:
times = blog_posts.map(&:published_at)
times.compact!
times.map!(&:iso8601)
Astute readers will notice that this code looks a lot like the
original OpenSSL::Cipher::AES
example from earlier. We have an
object assigned to a variable, which we make some modifications to,
and then return it.
So, instead of converting our functional-style method chain into
something that a Java programmer from the 1990s would write, we can
use #tap
.
blog_posts
.map(&:published_at)
.tap(&:compact!) # <-- this line changed
.map!(&:iso8601)
Here, #tap
is being used to always return the array object, ignoring
the return value from .compact!
.
Is it more readable than the original implementation? No. Is it at least better than the procedural-style implementation? Maybe. Will the performance improvement be noticeable? Unlikely. But we can rest easy in the knowledge that it allocates slightly less memory, and the only trade-offs were developer time and code readability.
Why Is It Named So Terribly?
Think of a phone call. The audio data is transmitted through various wires, exchanges, and radio waves, between the phones. Anywhere between the phones can be wiretapped to divert the audio to another listening device, without affecting the call. Sound familiar?
The word “wiretap” originates from a time when eavesdropping was done by placing an electrical tap on a literal piece of wire. What’s an electrical tap? It’s when you have an electrical circuit and you add new wiring to divert electricity. Sound familiar?
Electrical taps come from plumbing. Say you have a water pipe running through the kitchen wall to the bathroom — and while you want the pipe to continue carrying water to the bathroom, it might be convenient to divert some of that water to the kitchen too. You could hire a plumber to tap into the pipe and install a tap.
So if we have a chain of method calls and we want to divert the
intermediate return values somewhere else, without affecting the chain,
we might use a method called tap
. See — the name isn’t that
bad, after all.
I don’t think there exists an English word that would be a really good
fit for this functionality. The old ActiveSupport method #returning
does read better for the build and return use case, but it would
read worse for the method chaining use case. Other names that were
considered include with
, k
, then
, and_then
,
apply
, and tee
(after the CLI command, which also gets its
name from plumbing). But are any of these major improvements over
tap
? Not in my opinion.
Got questions? Comments? Milk?
Shoot an email to [email protected] or hit me up on Twitter (@tom_dalling).