Decoding nested JSON the right way in Elixir
Introduction
While I was working on a new HTTP client for the Holodex API, I chose the typical stack consisting of:
- HTTPoison, the popular Elixir HTTP client powered by Hackney
- Jason, the fast JSON coder / decoder library
While Jason
still takes higher spots in benchmarks, it lacks certain features, that I will demonstrate below.
UPDATE 2023: It seems you can now decode to atom keys and re-use them with Jason now too, then just use
struct/2
to cast to a struct. See the official Jason docs.
The problem
When you consume an HTTP JSON API resource, you receive a JSON string, and then you decode it with your JSON decoder of choice:
Jason.decode!(~s({"name":"Dan","age":42,"nationality":"Latvian"}))
%{"name" => "Dan", "age" => 42, "nationality" => "Latvian"}
However, you end up with a plain Elixir map, which has many disadvantages:
- Keys are binaries by default, instead of atoms
- You can’t use the
map.key_name
syntax, which is not idiomatic and assertive enough - It is harder to reason about the shape of the data within your system’s domain
Solutions
Solution A: Pre-process response bodies in the HTTP client
A frequent pattern that HTTP client libraries like HTTPoison
show is to simply define a set of fields that you expect to receive, then iterate over the map keys and convert them to atoms manually.
defmodule Holobot.Holofans.Client do
@moduledoc """
Holofans API HTTP client implementation, based on HTTPoison.
"""
use HTTPoison.Base
@expected_fields ~w(count total channels videos query comments)
@api_version "v1"
@impl true
def process_request_url(url) do
Application.fetch_env!(:holobot, :holofans_api) <> "#{@api_version}" <> url
end
@impl true
def process_response_body(body) do
body
|> Poison.decode!()
|> Map.take(@expected_fields)
|> Enum.map(fn {k, v} -> {String.to_atom(k), v} end)
end
end
However, there are many issues with this approach:
- We now have an overly specific assumption about the shape of the data we will receive (what if we expect an array of objects as the root entity?)
- Only the top level keys are converted to atoms, but any kind of nested objects will stay as maps with binary keys (now you will have different accessor syntax based on the depth!)
- String.to_atom/1 puts you in danger of overflowing your global atom table, in a system that has very high uptime and does many requests, this could become a big issue
- The client callbacks become bloated, which should be quite small and only apply minimal transformations to your requests and responses.
- Since it is not a struct, convenient libraries like TypedStruct cannot be taken advantage of.
Also, if you do want to attempt to create structs from the deserialised data, you might end up implementing a builder
like this:
@spec build_record(map) :: t()
def build_record(video) do
%__MODULE__{
yt_video_key: video["yt_video_key"],
title: video["title"],
status: video["status"],
live_schedule: video["live_schedule"],
live_start: video["live_start"],
live_end: video["live_end"],
live_viewers: video["live_viewers"],
channel: video["channel"]["yt_channel_id"],
is_uploaded: video["is_uploaded"],
duration_secs: video["duration_secs"],
is_captioned: video["is_captioned"]
}
end
Of course, this is rarely the way to go, since it creates a lot of duplication. It is possible to work around this by using a library like ExConstructor, but it still does not solve the issue of nested structures.
Solution B: Use Poison’s built-in object decoding functionality
A colleague shared a recent lightning talk from ElixirConf about a feature of Poison
I didn’t know about: decoding of a JSON string as a struct of your choice:
Poison.decode!(~s({"name": "Dan", "age": 42}), as: %Person{})
#=> %Person{name: "Dan", age: 42}
Great! The HTTP client is clean, we can dynamically specify the shape of the data, and the structure definition modules are also free of helper functions.
It seems that Poison
also solves the problem of dynamically generated atoms under the hood by using the String.to_existing_atom/1 function.
UPDATE: It seems that the most reliable way to ensure this happens is by also passing the option keys: :atoms!
to Poison.decode/2
, for example:
Poison.decode!(~s({"name": "Dan", "age": 42}), %{as: %Person{}, keys: atoms!})
The only problem left is nested data. This is where the feature has practically no documentation to speak of, but it works the following way:
defmodule Holodex.Api.Videos do
alias Holodex.Api.Client
alias Holodex.Model.{Channel, Comment, Video}
@list_of_videos_p [
%Video{
channel: %Channel{},
clips: [%Video{}],
sources: [%Video{}],
refers: [%Video{}],
simulcasts: [%Video{}],
mentions: [%Channel{}]
}
]
with url <- build_videos_url(opts),
body <- Client.get!(url).body do
Poison.decode!(body, %{as: @list_of_videos_p})
end
end
# Returns:
[
%Holodex.Model.Video{
available_at: "2024-08-11T11:05:00.000Z",
channel: %Holodex.Model.Channel{
banner: nil,
clip_count: nil,
description: nil,
english_name: "Planya",
id: "UCQaGj_l3dqmGWJLEbEmwgFQ",
...
},
id: "Sc5MRAvMm18",
live_viewers: nil,
mentions: nil,
...
},
...
]
We expect an array of Video
objects, which might also contain a nested Channel
, as well as arrays of Video
s and Channel
s nested under different fields. Then we define a pattern for the decoder to capture (in this case @list_of_videos_p
as a module attribute for reuse and to keep functions clean). The pattern simply defines how the data maps to structs of your choice. Then all that is left for you is to define type specs for your structs, and now you can also take full advantage of type specs:
@spec list_videos(opts()) ::
{:ok, [Video.t()]} | {:error, HTTPoison.Error.t()} | {:error, Exception.t()}
def list_videos(opts \\ %{}) do
with url <- build_videos_url(opts),
{:ok, response} <- Client.get(url),
{:ok, decoded} <- Poison.decode(response.body, %{as: @list_of_videos_p}) do
{:ok, decoded}
end
end
Conclusion
Of course, this solution does not come with free lunch: Poison
is still a bit slower than its main competitor, Jason
. However, in most cases of decoding external data, this major feature of Poison
is more valuable than CPU time or memory usage.
There is also an issue with using keyword list args with Poison.decode/2
(Dialyzer will complain), which I have raised here.
I have also raised the issue of poor documentation on the as
option usage HERE, which I hope to also address.
If you have more suggestions on how this can be done better, feel free to reach out on
- Twitter @danirukun
- Elixir Slack
@danpetrov
- Matrix
@dpetrovs:matrix.org