import json
FILE_NAME = "helium.json"
# Can expand to multiple lines
he = json.loads(open(FILE_NAME).read())JSON
Thinking Machines
Why JSON?
What is JSON?
- JSON stands for JavaScript Object Notation.
- It’s a lightweight, human-readable format for storing and transmitting data.
- It has become the de facto standard for data interchange on the web, especially for APIs (Application Programming Interfaces).
- Despite its name, JSON is language-agnostic and supported by every relevant programming language used today.
JSON’s Core Purpose
- The main goal is to represent structured data that can be easily serialized (converted into a string) and deserialized (converted back into native data types).
- It maps perfectly to common data structures like dictionaries and lists.
- This makes complex, nested data easy to handle.
JSON vs. CSV and Formats
- Scientific data is rarely flat. It often involves hierarchical relationships.
- One example: There are groups (or periods) that contain elements.
- JSON can capture this nesting naturally, unlike flat CSV files.
- Comma-separated values - used with pandas.
Two Core JSON Structures
- JSON data is built using only two universal structures:
- Dictionaries
- Collection of Key/Value Pairs
- Like state-to-capital or zip-to-state.
- List
- Ordered Collection of Values)
- Have numerical indicies rather than keys.
- Dictionaries
- Nested to any depth: a list of dictionaries of lists
Dictionaries
Python Aside: Dictionaries
- We review the most-beloved Python dictionaries.
- Dictionaries are Python’s core implementation of a common archetype.
- Often called a hash map or associative array.
- I call them hash tables because I’m set in my ways.
- They store data as unordered collections of key-value pairs.
- In a “normal” dictionary, keys are words and values are definitions.
Dictionary Syntax and Keys
- Dictionaries are defined using curly braces:
d = {}`.- Keys and values are separated by a colon:
{"key": "value"}- Keys are usually strings, but can be some other types.
Example
helium.json
{
"symbol": "He",
"phase_stp": "gas",
"group": 18,
"period": 1,
"boiling_point": {
"K": 4.222,
"C": -268.928,
"F": -452.070
}
}A note
- We can ensure these files are well-formed relatively easily.
- At command line
python3 -mjson.tool helium.json- Will show an error if bad.
- Will show the file otherwise.
- Within Python
- Will show an error if bad.
Accessing Values by Key
- Values are retrieved using the key inside square brackets (
[]). - This is the standard way to access data fields parsed from a JSON object.
# We previously read in `he`
he["symbol"]'He'
Adding or Updating Pairs
- Once in Python, we can use the dictionary read from JSON like any other dictionary.
- We can add something, like the atomic number.
- I would see (scroll to end to see
number):
- I would see (scroll to end to see
he["number"] = 2
he{'symbol': 'He',
'phase_stp': 'gas',
'group': 18,
'period': 1,
'boiling_point': {'K': 4.222, 'C': -268.928, 'F': -452.07},
'number': 2}
The get() Method
- Accessing a missing key with
[]causes a KeyError and crashes. - Simply use
.get.
try:
print(he["density"])
except:
print("Use `.get()!`")
print(he.get("density")) # Will be `None` so we print to see itUse `.get()!`
None
Iterating over Keys
- By default, iterating over a dictionary iterates over its keys.
- This is useful for inspection or processing.
for key in he:
print(key)symbol
phase_stp
group
period
boiling_point
number
Iterating over Values
- Use the
.values()method to iterate only through the values in the dictionary.
for value in he.values():
print(value)He
gas
18
1
{'K': 4.222, 'C': -268.928, 'F': -452.07}
2
Iterating over Key-Value Pairs
- The
.items()method returns a tuple of(key, value)for each pair, which you can unpack in the loop.
for key, value in he.items():
print(key, ":", value) # Take note of that : I use as a nicetysymbol : He
phase_stp : gas
group : 18
period : 1
boiling_point : {'K': 4.222, 'C': -268.928, 'F': -452.07}
number : 2
Nested Dictionaries
- Since a value can be any type, a value can be another dictionary (or a list of dictionaries).
he["boiling_point"]["K"]4.222
- Could also be a list, in which case you’d use a numerical index.
JSON
JSON Object Syntax
- An dictionary, in JSON called an “object”, represents an unordered collection of key/value pairs.
- Starts with
{and ends with}. - Keys must be strings and enclosed in double quotes.
- This differs from Python, which allows keys to be some other stuff.
- Keys and values are separated by a colon (
:). Pairs are separated by commas (,).
URLs
- It is extremely common to use JSON directly from a URL.
- This is possible with e.g. CSV files but less common.
- JSON is a de facto internet format.
- (You can open them in browsers!)
- helium.json
Requests
- We can easily read JSON from urls with
requests
import requests
URL = "https://cd-public.github.io/scicom/helium.json"
response = requests.get(URL)
# vax_data is now a Python list of dictionaries
he = response.json()
he # Note that "number" is gone - we have the original file again.{'symbol': 'He',
'phase_stp': 'gas',
'group': 18,
'period': 1,
'boiling_point': {'K': 4.222, 'C': -268.928, 'F': -452.07}}
Our data
- Loosely adapted from “The JSON format”
- It uses this data:
URL = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.json"
# Can do it in one line by the way.
vax = requests.get(URL).json()It’s big!
- Don’t try to print that whole thing; it’s huge.
- 2031952 lines.
- We will traverse it.
- Check the type.
- If dictionary, look at an arbitrary key.
- If list, look at index zero.
- We’ll repeat this until we get something that isn’t a dictionary or list.
- Check the type.
Traverse
loc = vax # for location
while type(loc) == type({}) or type(loc) == type([]):
if type(loc) == type({}):
keys = list(loc.keys())
print(keys[0])
loc = loc[keys[0]]
if type(loc) == type([]):
print(0)
loc = loc[0]
print(loc)0
country
Afghanistan
What this means?
- We have a list of somethings.
- Those somethings are dictionaries of which one has the key “country”
- The values in that dictionary are strings, of which one is “Afghanistan”
- Are we correct?
- Check the url: Link!
Okay, so
- List of
- Dictionaries, representing countries, of
"country": name and"iso_code": whatever that is"data", a list of- dictionaries, representing dates, of
"date": a date as a string, like"2021-02-22"- some other stuff on vaccination.
- dictionaries, representing dates, of
- Dictionaries, representing countries, of
Object Example: Daily Data
- An object of a single date in a single country:
vax[0]["data"][0]{'date': '2021-02-22',
'total_vaccinations': 0,
'people_vaccinated': 0,
'total_vaccinations_per_hundred': 0.0,
'people_vaccinated_per_hundred': 0.0}
JSON Array Syntax
- An Array represents an ordered sequence of values.
- Starts with
[and ends with]. - Values are separated by commas (
,). - Values can be of any valid JSON data type.
Array Example: Time Series
- This array holds the time-series data for one country:
vax[0]["data"][0:3][{'date': '2021-02-22',
'total_vaccinations': 0,
'people_vaccinated': 0,
'total_vaccinations_per_hundred': 0.0,
'people_vaccinated_per_hundred': 0.0},
{'date': '2021-02-23',
'daily_vaccinations': 1367,
'daily_vaccinations_per_million': 33,
'daily_people_vaccinated': 1367,
'daily_people_vaccinated_per_hundred': 0.003},
{'date': '2021-02-24',
'daily_vaccinations': 1367,
'daily_vaccinations_per_million': 33,
'daily_people_vaccinated': 1367,
'daily_people_vaccinated_per_hundred': 0.003}]
- The array contains JSON objects.
Valid JSON Data Types
- String: text, double-quoted. E.g.,
"Olive Drab".- Can’t be single quote as in Python
- Number: integer or float. E.g.,
7or3.14. - Boolean:
trueorfalse.- Lowercase, unlike Python
TrueandFalse
- Lowercase, unlike Python
- null: missing value
- Like Python
None
- Like Python
- Object: a dictionary
{} - Array: a list
[]
Data Structure: Top Level
- The entire
vaxfile is one single JSON array. - Each element in this top-level array is a JSON object, representing a single country/region.
- This is fake, to fit on screen.
[
{ "country": "Narnia", "data": [...] },
{ "country": "Wakanda", "data": [...] },
...
]Data Structure: Nested Data
- Inside each country object, there is a key called
"data". - The value of
"data"is an array of daily vaccination records.- This is fake, to fit on screen.
{
"country": "Guilder",
"iso_code": "GG",
"data": [
{ "date": "...", "total_vaccinations": ... },
...
]
}Syntax Rule: Double Quotes
- JSON is very strict about syntax. Keys and string values must use double quotes (
"). - Invalid JSON:
{ 'country': 'Elysium' }(Uses single quotes). - Valid JSON:
{ "country": "Cascadia" } - Numbers, booleans, and
nullare NOT quoted.
Syntax Rule: No Trailing Commas
- The last key/value pair in an object, or the last item in an array, must not be followed by a comma.
- This is fake, to fit on screen.
- Invalid JSON:
[
"item1",
"item2",
]Handling Missing (null) Values
- In JSON, a missing value is explicitly represented by
null. - When parsed in Python,
nullbecomesNone. - Always check for
Nonebefore trying to perform arithmetic or string operations.
# Example: "daily_vaccinations" might be null
i = 0
while vax[0]["data"][i].get("daily_vaccinations") is None:
i += 1
print(i, vax[0]["data"][i].get("daily_vaccinations"))1 1367
Writing JSON in Python
- Use
json.dump()to write a Python object (dict or list) to a file. - Use
json.dumps()to convert it to a JSON formatted string.
# A dictionary containing processed data
summary = {
"report_country": vax[0]["country"],
"latest_total_vax": vax[0]["data"][-1]["total_vaccinations"]
}
json.dumps(summary)'{"report_country": "Afghanistan", "latest_total_vax": 22964750}'
Pretty Printing JSON
- The
indentargument makes the JSON human-readable for debugging or configuration files.
print(json.dumps(summary, indent=4)){
"report_country": "Afghanistan",
"latest_total_vax": 22964750
}
Handling Numbers
- JSON distinguishes between integers and floating-point numbers.
- The vaccination counts are large, so they are often represented as integers:
"population": 331002651. - Rates and percentages would be floats:
"daily_vaccinations_per_hundred": 0.52. - Your programming language (Python) will handle this distinction automatically upon parsing.
Summary: Key Takeaways
- JSON is the foundation of modern data exchange, built on objects (
{}) and arrays ([]). - It excels at handling nested, hierarchical data like the Country/Time-series structure of the OWID data.
- Python’s
jsonlibrary and therequests.json()method are your primary tools. - Always be vigilant for strict syntax rules (double quotes, no trailing commas) and handle
nullvalues.
Final Review: Structure
[
{
"planet": "Acheron",
"designation": "LV-426",
"data": [
{
"film": "Alien",
"population": 0,
},
{
"film": "Aliens",
"population": 158,
}
]
},
...
]begin array
begin object
key: value,
key: value,
key: begin array
begin object
key: value,
key: value,
end object,
begin object
key: value,
key: value,
end object
end array
end object,
...
end arrayFinal Review: Py vs. Js
x = [
{
'quotes': 'double',
"boolean_capitalization": False,
"terminating_commas": None,
},
]
print(json.dumps(x, indent=4))[
{
"quotes": "double",
"boolean_capitalization": false,
"terminating_commas": null
}
]