Learning Clojure: Using the REPL

There are three thing that can greatly accelerate learning a new programming language

There are three thing that can greatly accelerate learning a new programming language.

  1. Good documentation
  2. Fast Iteration
  3. A balanced task that is not too trivial and not too complex

While learning Clojure, the REPL can provide the first and second items. A good function is documented in such a way that you can look up the documentation by name right in the REPL. Pretty much all of the built in functions have good documentation and for any ambiguities or nuance I tend to go to ClojureDocs for examples.

Using the REPL also lets us try out various attempts and then seeing the outcomes very quickly.

Finally, there are a plethora of easy to use packages allowing us to do meaningful work without writing everything from scratch.

For our example task, let's say we've gotten a bunch of data about our monthly spending in the form of a comma separated values file. We'll need to read it in, perform some transformations on it, and insert it in a database in preparation for further analysis and processing. In this example, we'll be working with clojure-csv to parse comma separated values, and monger to insert data into a MongoDB database. There are a ton of useful built in Clojure functions we'll use along the way.

First let's read in some data from a file. There are other ways we could do this, including setting up a Reader, but for now we'll just use slurp:

spending.core> (def input (slurp "jun-sep.csv"))

Now, let's get the CSV package ready:

spending.core> (require '[clojure-csv.core :as csv])

OK, we've read in the input file and we have a package for parsing it. But how do we use the CSV package? Let's use doc to get some information on it:

spending.core> (doc csv/parse-csv)
([csv & {:as opts}])
  Takes a CSV as a string or Reader and returns a seq of the parsed CSV rows,
   in the form of a lazy sequence of vectors: a vector per row, a string for
   each cell.

   Accepts a number of keyword arguments to change the parsing behavior:
        :delimiter - A character that contains the cell separator for
                     each column in a row.  Default value: \,
        :end-of-line - A string containing the end-of-line character
                       for reading CSV files. If this setting is nil then
                       \n and \r\n are both accepted.  Default value: nil
        :quote-char - A character that is used to begin and end a quoted cell.
                      Default value: \"
        :strict - If this variable is true, the parser will throw an
                  exception on parse errors that are recoverable but
                  not to spec or otherwise nonsensical.  Default value: false

Nice, a full description of the parse-csv function. So we'll use that to parse the input CSV data into a seq and go from there.

spending.core> (def spending (csv/parse-csv input :strict true))
spending.core> (first spending)
["Date" "Account" "Description" "Category" "Tags" "Amount"]
spending.core> (second spending)
["2016-09-26" "123 Main St." "Payment" "Mortgages" "" "1341.88"]

We've parsed the input, and it appears the first item is a vector of column names. The rest of the items are our data. The column names will come in handy!

Let's turn those column names into keywords, to feel a bit more like we're working with clojure data structures. We use map and keyword to convert them each to keywords.

spending.core> (def spending-columns (map #(keyword %) (first spending)))
spending.core> spending-columns
(:Date :Account :Description :Category :Tags :Amount)

Now how will we go about applying those columns names as keywords to every row of data we have? There's a handy function called zipmap that interleaves the items of one collection with another. But how does it work?

spending.core> (doc zipmap)
([keys vals])
  Returns a map with the keys mapped to the corresponding vals.

OK, sounds perfect. For every row of data we have, let's interleave the column names with the values. But to test out this idea, since we're learning, let's just do it to one row, using second. Why second? Because the first item is still the column names themselves.

spending.core> (zipmap spending-columns (second spending))
{:Date "2016-09-26", :Account "123 Main St.", :Description "Payment", :Category "Mortgages", :Tags "", :Amount "1341.88"}

Yup, that looks like what we want. Later we'll use zipmap on the rest of the data.

What if we have a bunch of different things we want to do with this data? What if we want to work with it without re-parsing it from CSV. One answer is to store it in a database. I happen to like MongoDB because there are lots of good packages easily available for working with it.

Let's use a package called Monger to store the data we've been preparing:

spending.core> (require '[monger.core :as monger])
spending.core> (require '[monger.collection :as mc])

When we want to store a bunch of items we'll want to use the insert-batch function. Let's use doc to see how that works.

spending.core> (doc mc/insert-batch)
([db coll documents] [db coll documents concern])
  Saves documents do collection. You can optionally specify WriteConcern as a third argument.

We need a connection and what database to work with.

spending.core> (def connection (monger/connect))
spending.core> (def DB (monger/get-db connection "spending"))

Let's use that handy zipmap function to prepare the data and then insert-batch to insert the rows into our database.

spending.core> (def spending-rows (map #(zipmap spending-columns %) (rest spending)))
spending.core> (mc/insert-batch DB "transactions" spending-rows)
#object[com.mongodb.WriteResult 0x65bd103e "WriteResult{, n=0, updateOfExisting=false, upsertedId=null}"]

We can confirm by reading back out of the database:

spending.core> (mc/find-maps DB "transactions" {:Category "Mortgages"})
({:Amount "1341.88", :Tags "", :Category "Mortgages", :Description "Payment", :Account "123 Main St.", :Date "2016-09-26", :_id #object[org.bson.types.ObjectId 0x22a002c0 "57ec22a1c6064175c47e38f0"]} ...
spending.core> (mc/count DB "transactions" {:Category "Mortgages"})

We're getting pretty familiar with how to ingest some data and get it into a database. We've used the REPL to quickly try things out and get to this familiarity. In subsequent posts we'll improve on this by formatting some of the rows (e.g. the date row is not so useful as a basic String), and figure out how to query subsets of the data. At some point we'll take the things we've learned and gather them all into some re-useable code. Stay tuned!