JSON

Thinking Machines

Author

Prof. Calvin

Why JSON?

What is JSON?

  • JSON stands for JavaScript Object Notation.
  • It’s a lightweight, human-readable format for storing and transmitting data.
  • It has become the de facto standard for data interchange on the web, especially for APIs (Application Programming Interfaces).
  • Despite its name, JSON is language-agnostic and supported by every relevant programming language used today.

JSON’s Core Purpose

  • The main goal is to represent structured data that can be easily serialized (converted into a string) and deserialized (converted back into native data types).
  • It maps perfectly to common data structures like dictionaries and lists.
  • This makes complex, nested data easy to handle.

JSON vs. CSV and Formats

  • Scientific data is rarely flat. It often involves hierarchical relationships.
  • One example: There are groups (or periods) that contain elements.
  • JSON can capture this nesting naturally, unlike flat CSV files.
    • Comma-separated values - used with pandas.

Two Core JSON Structures

  • JSON data is built using only two universal structures:
    1. Dictionaries
      • Collection of Key/Value Pairs
      • Like state-to-capital or zip-to-state.
    2. List
      • Ordered Collection of Values)
      • Have numerical indicies rather than keys.
  • Nested to any depth: a list of dictionaries of lists

Dictionaries

Python Aside: Dictionaries

  • We review the most-beloved Python dictionaries.
  • Dictionaries are Python’s core implementation of a common archetype.
    • Often called a hash map or associative array.
    • I call them hash tables because I’m set in my ways.
  • They store data as unordered collections of key-value pairs.
    • In a “normal” dictionary, keys are words and values are definitions.

Dictionary Syntax and Keys

  • Dictionaries are defined using curly braces:
d = {}`.
  • Keys and values are separated by a colon:
{"key": "value"}
  • Keys are usually strings, but can be some other types.

Example

helium.json
{
    "symbol": "He",
    "phase_stp": "gas",
    "group": 18,
    "period": 1,
    "boiling_point": {
        "K": 4.222,
        "C": -268.928,
        "F": -452.070
     }
}

A note

  • We can ensure these files are well-formed relatively easily.
  • At command line
python3 -mjson.tool helium.json
  • Will show an error if bad.
  • Will show the file otherwise.
  • Within Python
import json

FILE_NAME = "helium.json"

# Can expand to multiple lines
he = json.loads(open(FILE_NAME).read())
  • Will show an error if bad.

Accessing Values by Key

  • Values are retrieved using the key inside square brackets ([]).
  • This is the standard way to access data fields parsed from a JSON object.
# We previously read in `he`
he["symbol"]
'He'

Adding or Updating Pairs

  • Once in Python, we can use the dictionary read from JSON like any other dictionary.
  • We can add something, like the atomic number.
    • I would see (scroll to end to see number):
he["number"] = 2
he
{'symbol': 'He',
 'phase_stp': 'gas',
 'group': 18,
 'period': 1,
 'boiling_point': {'K': 4.222, 'C': -268.928, 'F': -452.07},
 'number': 2}

The get() Method

  • Accessing a missing key with [] causes a KeyError and crashes.
  • Simply use .get.
try:
    print(he["density"])
except:
    print("Use `.get()!`")
print(he.get("density")) # Will be `None` so we print to see it
Use `.get()!`
None

Iterating over Keys

  • By default, iterating over a dictionary iterates over its keys.
  • This is useful for inspection or processing.
for key in he:
    print(key)
symbol
phase_stp
group
period
boiling_point
number

Iterating over Values

  • Use the .values() method to iterate only through the values in the dictionary.
for value in he.values():
    print(value)
He
gas
18
1
{'K': 4.222, 'C': -268.928, 'F': -452.07}
2

Iterating over Key-Value Pairs

  • The .items() method returns a tuple of (key, value) for each pair, which you can unpack in the loop.
for key, value in he.items():
    print(key, ":", value) # Take note of that : I use as a nicety
symbol : He
phase_stp : gas
group : 18
period : 1
boiling_point : {'K': 4.222, 'C': -268.928, 'F': -452.07}
number : 2

Nested Dictionaries

  • Since a value can be any type, a value can be another dictionary (or a list of dictionaries).
he["boiling_point"]["K"]
4.222
  • Could also be a list, in which case you’d use a numerical index.

JSON

JSON Object Syntax

  • An dictionary, in JSON called an “object”, represents an unordered collection of key/value pairs.
  • Starts with { and ends with }.
  • Keys must be strings and enclosed in double quotes.
    • This differs from Python, which allows keys to be some other stuff.
  • Keys and values are separated by a colon (:). Pairs are separated by commas (,).

URLs

  • It is extremely common to use JSON directly from a URL.
    • This is possible with e.g. CSV files but less common.
    • JSON is a de facto internet format.
    • (You can open them in browsers!)
    • helium.json

Requests

  • We can easily read JSON from urls with requests
import requests
URL = "https://cd-public.github.io/scicom/helium.json"

response = requests.get(URL)

# vax_data is now a Python list of dictionaries
he = response.json() 

he # Note that "number" is gone - we have the original file again.
{'symbol': 'He',
 'phase_stp': 'gas',
 'group': 18,
 'period': 1,
 'boiling_point': {'K': 4.222, 'C': -268.928, 'F': -452.07}}

Our data

URL = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.json"
# Can do it in one line by the way.
vax = requests.get(URL).json()

It’s big!

  • Don’t try to print that whole thing; it’s huge.
    • 2031952 lines.
  • We will traverse it.
    • Check the type.
      • If dictionary, look at an arbitrary key.
      • If list, look at index zero.
    • We’ll repeat this until we get something that isn’t a dictionary or list.

Traverse

loc = vax # for location

while type(loc) == type({}) or type(loc) == type([]):
        if type(loc) == type({}):
            keys = list(loc.keys())
            print(keys[0])
            loc = loc[keys[0]]
        if type(loc) == type([]):
            print(0)
            loc = loc[0]
print(loc)
0
country
Afghanistan

What this means?

  • We have a list of somethings.
  • Those somethings are dictionaries of which one has the key “country”
  • The values in that dictionary are strings, of which one is “Afghanistan”
  • Are we correct?

Okay, so

  • List of
    • Dictionaries, representing countries, of
      • "country": name and "iso_code": whatever that is
      • "data", a list of
        • dictionaries, representing dates, of
          • "date": a date as a string, like "2021-02-22"
          • some other stuff on vaccination.

Object Example: Daily Data

  • An object of a single date in a single country:
vax[0]["data"][0]
{'date': '2021-02-22',
 'total_vaccinations': 0,
 'people_vaccinated': 0,
 'total_vaccinations_per_hundred': 0.0,
 'people_vaccinated_per_hundred': 0.0}

JSON Array Syntax

  • An Array represents an ordered sequence of values.
  • Starts with [ and ends with ].
  • Values are separated by commas (,).
  • Values can be of any valid JSON data type.

Array Example: Time Series

  • This array holds the time-series data for one country:
vax[0]["data"][0:3]
[{'date': '2021-02-22',
  'total_vaccinations': 0,
  'people_vaccinated': 0,
  'total_vaccinations_per_hundred': 0.0,
  'people_vaccinated_per_hundred': 0.0},
 {'date': '2021-02-23',
  'daily_vaccinations': 1367,
  'daily_vaccinations_per_million': 33,
  'daily_people_vaccinated': 1367,
  'daily_people_vaccinated_per_hundred': 0.003},
 {'date': '2021-02-24',
  'daily_vaccinations': 1367,
  'daily_vaccinations_per_million': 33,
  'daily_people_vaccinated': 1367,
  'daily_people_vaccinated_per_hundred': 0.003}]
  • The array contains JSON objects.

Valid JSON Data Types

  1. String: text, double-quoted. E.g., "Olive Drab".
    • Can’t be single quote as in Python
  2. Number: integer or float. E.g., 7 or 3.14.
  3. Boolean: true or false.
    • Lowercase, unlike Python True and False
  4. null: missing value
    • Like Python None
  5. Object: a dictionary {}
  6. Array: a list []

Data Structure: Top Level

  • The entire vax file is one single JSON array.
  • Each element in this top-level array is a JSON object, representing a single country/region.
    • This is fake, to fit on screen.
[
  { "country": "Narnia", "data": [...] }, 
  { "country": "Wakanda", "data": [...] },
  ...
]

Data Structure: Nested Data

  • Inside each country object, there is a key called "data".
  • The value of "data" is an array of daily vaccination records.
    • This is fake, to fit on screen.
{
  "country": "Guilder",
  "iso_code": "GG",
  "data": [
    { "date": "...", "total_vaccinations": ... },
    ... 
  ]
}

Syntax Rule: Double Quotes

  • JSON is very strict about syntax. Keys and string values must use double quotes (").
  • Invalid JSON: { 'country': 'Elysium' } (Uses single quotes).
  • Valid JSON: { "country": "Cascadia" }
  • Numbers, booleans, and null are NOT quoted.

Syntax Rule: No Trailing Commas

  • The last key/value pair in an object, or the last item in an array, must not be followed by a comma.
    • This is fake, to fit on screen.
  • Invalid JSON:
[
  "item1",
  "item2", 
]

Handling Missing (null) Values

  • In JSON, a missing value is explicitly represented by null.
  • When parsed in Python, null becomes None.
  • Always check for None before trying to perform arithmetic or string operations.
# Example: "daily_vaccinations" might be null

i = 0

while vax[0]["data"][i].get("daily_vaccinations") is None:
    i += 1
    
print(i, vax[0]["data"][i].get("daily_vaccinations"))
1 1367

Writing JSON in Python

  • Use json.dump() to write a Python object (dict or list) to a file.
  • Use json.dumps() to convert it to a JSON formatted string.
# A dictionary containing processed data
summary = {
    "report_country": vax[0]["country"],
    "latest_total_vax": vax[0]["data"][-1]["total_vaccinations"]
}

json.dumps(summary)
'{"report_country": "Afghanistan", "latest_total_vax": 22964750}'

Pretty Printing JSON

  • The indent argument makes the JSON human-readable for debugging or configuration files.
print(json.dumps(summary, indent=4))
{
    "report_country": "Afghanistan",
    "latest_total_vax": 22964750
}

Handling Numbers

  • JSON distinguishes between integers and floating-point numbers.
  • The vaccination counts are large, so they are often represented as integers: "population": 331002651.
  • Rates and percentages would be floats: "daily_vaccinations_per_hundred": 0.52.
  • Your programming language (Python) will handle this distinction automatically upon parsing.

Summary: Key Takeaways

  • JSON is the foundation of modern data exchange, built on objects ({}) and arrays ([]).
  • It excels at handling nested, hierarchical data like the Country/Time-series structure of the OWID data.
  • Python’s json library and the requests.json() method are your primary tools.
  • Always be vigilant for strict syntax rules (double quotes, no trailing commas) and handle null values.

Final Review: Structure

[
  {
    "planet": "Acheron", 
    "designation": "LV-426",
    "data": [                  
      {
        "film": "Alien",
        "population": 0,
      },                       
      {
        "film": "Aliens",
        "population": 158,
      }
    ]
  },
  ...
]

Final Review: Py vs. Js

x = [
  {
    'quotes': 'double', 
    "boolean_capitalization": False,
    "terminating_commas": None,
  },
]
print(json.dumps(x, indent=4))
[
    {
        "quotes": "double",
        "boolean_capitalization": false,
        "terminating_commas": null
    }
]