Building an Affiliation Network for Blog Posts & Tags in Hugo
Table of Contents
Introduction§
Hugo websites typically consist of blog posts, which are organised into taxonomies with the use of tags. In a typical Hugo website, these tags are listed for each blog post, and a tags page can be used to navigate the blog posts belonging to each tag (although I suspect that tagging blog posts is one of those features that developers implement because they feel that they should, despite the end user probably never using said feature); clearly, I am one of those developers who has implemented tagging on their blog website despite not being sure that anyone actually uses it, just on the off-chance that someone might one day want to take a look at everything I’ve written on a given topic.
However, I wondered if there was a better way by which I could display these relationships between blog posts and tags in a way that was easier to navigate or simply more visually interesting than listing tags & blog posts, and the knowledge graph in the note-taking application Obsidian immediately came to mind: a visualisation of the interconnections between individual notes in the user’s “vault”, with each node representing a single note and each edge representing a link between that note and another. While I suspect that this graph visualisation is not actually useful for navigation compared to traditional search methods, it certainly serves its other purpose of looking cool.

Such a graph is not particularly suitable for my application as this website doesn’t have a great deal of linking in-between posts; in fact, at time of writing, I don’t believe there are any cross-links between posts on this website. Instead, posts are tagged, and so the nature of the graph constructed on the data is very different: the Obsidian graph view is a simple1 directed graph, with edges indicating the direction of the link from one note to another. In contrast, to construct a meaningful graph of connections between posts on this website, an affiliation network would be the best choice: a simple1, undirected, bipartite graph wherein there are two distinct types of nodes representing two distinct kinds of entities, typically individuals & groups, and in this case, blog posts & tags. A bipartite graph is a graph $G=(X,E)$ consisting of a node set $X$ and an edge set E, where $X$ can be divided into two disjoint subsets $X_1$ & $X_2$ such that:
- $X_1 \cap X_2 = \emptyset$: that is, the two subsets are completely disjoint and share no nodes between them.
- $X_1 \cup X_2 = X$: that is, each node in $X$ is in either $X_1$ or $X_2$ — there are no missing nodes.
Building the Graph§
Since this affiliation network will be part of the website and displayed on the Tags page going forward, it will almost certainly be updated repeatedly as time goes on, so the version of the code referred to here will be the first working version of the code published, and will not refer to subsequent versions. Furthermore, I imagine the code may need to be tweaked or refactored as the size of the graph grows, so the graph data used for this post will be static from the time of writing.
To display an affiliation graph of the relationships between tags & blog posts on this website, it was first necessary to obtain a representation of this graph in a machine-readable format. When a website is built in Hugo, HTML files are produced, with the HTML file for each blog post typically containing the tags associated with that blog post, a HTML file for each tag being generated which links to all the blog posts which belong to that taxonomy, and a tags page being generated which lists all the tags used on the website. I chose to use a Python script using the Beautiful Soup HTML parsing library to extract the tags from each blog post on the website:
8def list_blog_posts(blog_directory):
9 """
10 Returns a list of all the blog post slugs generated by Hugo in the build process, i.e.,
11 not the titles of blog posts but the string used to identify the blog post in its URL.
12
13 Args:
14 blog_directory (str): the path to the /public/blog sub-directory of the Hugo project.
15
16 Returns:
17 list: a list of the blog post slugs in the blog_directory, in str form.
18 """
19
20 return [ directory for directory in os.listdir(blog_directory) if os.path.isdir(os.path.join(blog_directory, directory)) ]
21
22
23def extract_title(blog_post, blog_directory):
24 """
25 Extracts the title of the given blog post from its HTML.
26
27 Args:
28 blog_post (str): the blog post slug in the blog_directory.
29 blog_directory (str): the path to the /public/blog sub-directory of the Hugo project.
30
31 Returns:
32 str: the title of the blog post.
33 """
34
35 with open(blog_directory + "/" + blog_post + "/index.html", "r", encoding="utf-8") as file:
36 soup = BeautifulSoup(file, "html.parser")
37 title_element = soup.find(id="blogpost-title")
38 return title_element.get_text()
39
40
41def extract_tags(blog_post, blog_directory):
42 """
43 Extracts the tags of the given blog post from its HTML.
44
45 Args:
46 blog_post (str): the blog post slug in the blog_directory.
47 blog_directory (str): the path to the /public/blog sub-directory of the Hugo project.
48
49 Returns:
50 list: a list of str tags.
51 """
52
53 with open(blog_directory + "/" + blog_post + "/index.html", "r", encoding="utf-8") as file:
54 soup = BeautifulSoup(file, "html.parser")
55 tags_div = soup.find("div", id="tags")
56 tag_links = tags_div.find_all("a")
57 return ["".join(str(child).lower() for child in a.contents) for a in tag_links]
I then used the networkx
package to build the graph in Python, which can then be written to a JSON file using json.dump()
:
60def build_affiliation_network(blog_posts, blog_directory):
61 """
62 Builds an affiliation network consisting of a bipartite node set of blog post
63 nodes & tag nodes, with an edge between two nodes if the blog post has that tag.
64
65 Args:
66 blog_posts (list): a list of the blog post slugs in the blog_directory, in str form.
67 blog_directory (str): the path to the /public/blog sub-directory of the Hugo project.
68
69 Returns:
70 networkx.classes.graph.Graph: a networkx graph object representing the affiliation networka
71 """
72
73 graph = nx.Graph()
74
75 for blog_post in blog_posts:
76 title = extract_title(blog_post, blog_directory)
77 graph.add_node(blog_post, title=title)
78
79 tags = extract_tags(blog_post, blog_directory)
80 graph.add_edges_from([ (blog_post, tag) for tag in tags ])
81
82 return graph
83
84def main():
85 blog_directory = "./public/blog/"
86 blog_posts = list_blog_posts(blog_directory)
87
88 affiliation_network = build_affiliation_network(blog_posts, blog_directory)
89 graph_data = nx.node_link_data(affiliation_network)
90
91 with open("./public/blog/graph.json", "w") as f:
92 json.dump(graph_data, f)
A (prettified) sample of the output JSON is as follows:
1{
2 "directed": false,
3 "multigraph": false,
4 "graph": {},
5 "nodes": [
6 {
7 "id": "article"
8 },
9 ,
10 {
11 "title": "What Does unlink Actually Do?",
12 "id": "what-does-unlink-actually-do"
13 },
14 {
15 "id": "linux"
16 },
17 {
18 "id": "shell"
19 }
20 ],
21 "links": [
22 {
23 "source": "article",
24 "target": "what-does-unlink-actually-do"
25 },
26 {
27 "source": "what-does-unlink-actually-do",
28 "target": "linux"
29 },
30 {
31 "source": "what-does-unlink-actually-do",
32 "target": "shell"
33 }
34 ]
35}
The blog post nodes can be implicitly distinguished from the tags nodes on the basis that the blog post nodes will have an associated title
attribute whereas the tag nodes will not;
a more robust & generalisable solution could be to tag each node with the values "blogpost"
or "tag"
as appropriate, but is not necessary for our purposes.
The Python script to build this affiliation network and create the associated JSON file is run automatically as part of the GitHub Action that I use to build & deploy this website, using the following additional steps in the configuration of the build
job:
63 - name: Install Python dependencies
64 run: pip install -r scripts/requirements.txt
65 - name: Generate affiliation network
66 run: python3 scripts/graph.py
Displaying the Graph§
I chose to use D3.js to create the graph, chosen for its flexibility & interactive nature, using the d3-force
layout and largely based upon the disjoint force-directed graph example available in the documentation.
The JavaScript for the graph itself is included directly in the content/tags/_index.md
Hugo file for this website (and also here within this page).
The nodes of the graph are colour-coded, with ■#B0CFFF
blue representing blog post nodes and ■#FFA7CE
pink representing tag nodes.
Each node is labelled, with blog post nodes displaying the title of the blog post which they represent in a serif font, and tag nodes displaying the name of the tag prepended with a #
symbol in a monospace font;
the labels for both types of node are clickable links either to the blog post itself or the page listing all the posts associated with the tag.
This project has been my first foray into data visualisation with D3.js, and while I am pleased with the result, it’s nonetheless quite likely that my code contains a number of rookie mistakes, so if you find any, please do let me know! The HTML & JavaScript for the visualisation itself can be seen below:
1<style>
2 svg text {
3 fill: currentColor;
4 }
5</style>
6<svg></svg>
7
8<script type="module">
9import * as d3 from "https://cdn.jsdelivr.net/npm/d3@7/+esm";
10
11async function drawChart() {
12 const response = await fetch("/blog/graph.json");
13 const data = await response.json();
14
15 const width = 1000;
16 const height = 600;
17 const color = d3.scaleOrdinal(d3.schemeCategory10);
18
19 const links = data.links.map(d => ({ ...d }));
20 const nodes = data.nodes.map(d => ({ ...d }));
21
22 const simulation = d3.forceSimulation(nodes)
23 .force("link", d3.forceLink(links).id(d => d.id).distance(100))
24 .force("charge", d3.forceManyBody().strength(-500))
25 .force("x", d3.forceX())
26 .force("y", d3.forceY());
27
28 const svg = d3.create("svg")
29 .attr("width", width)
30 .attr("height", height)
31 .attr("viewBox", [-width / 2, -height / 2, width, height])
32 .attr("style", "max-width: 100%; height: auto;");
33
34 const link = svg.append("g")
35 .attr("stroke", "#999")
36 .attr("stroke-opacity", 0.6)
37 .selectAll("line")
38 .data(links)
39 .join("line")
40 .attr("stroke-width", d => Math.sqrt(d.value || 1));
41
42 const node = svg.append("g")
43 .attr("stroke", "#fff")
44 .attr("stroke-width", 1.5)
45 .selectAll("circle")
46 .data(nodes)
47 .join("circle")
48 .attr("r", d => d.title ? 12 : 8)
49 .attr("fill", d => d.title ? "#B0CFFF" : "#FFA7CE")
50 .call(d3.drag()
51 .on("start", dragstarted)
52 .on("drag", dragged)
53 .on("end", dragended));
54
55 node.append("title").text(d => d.id);
56
57 const label = svg.append("g")
58 .attr("text-anchor", "middle")
59 .selectAll("text")
60 .data(nodes)
61 .join("a")
62 .attr("xlink:href", d => d.title ? `/blog/${d.id}` : `/tags/${d.id}`)
63 .attr("target", "_blank") // Open in new tab
64 .append("text")
65 .attr("font-family", d => d.title ? "inherit" : "monospace")
66 .attr("font-size", d => d.title ? 16 : 12)
67 .text(d => d.title ? d.title : "#" + d.id);
68
69 simulation
70 .force("labelCollision", d3.forceCollide()
71 .radius(20)
72 .strength(0.5));
73
74 simulation.on("tick", () => {
75 link
76 .attr("x1", d => d.source.x)
77 .attr("y1", d => d.source.y)
78 .attr("x2", d => d.target.x)
79 .attr("y2", d => d.target.y);
80
81 node
82 .attr("cx", d => d.x)
83 .attr("cy", d => d.y);
84
85 label
86 .attr("x", d => d.x)
87 .attr("y", d => d.y - 12);
88 });
89
90 function dragstarted(event) {
91 if (!event.active) simulation.alphaTarget(0.3).restart();
92 event.subject.fx = event.subject.x;
93 event.subject.fy = event.subject.y;
94 }
95
96 function dragged(event) {
97 event.subject.fx = event.x;
98 event.subject.fy = event.y;
99 }
100
101 function dragended(event) {
102 if (!event.active) simulation.alphaTarget(0);
103 event.subject.fx = null;
104 event.subject.fy = null;
105 }
106
107 document.querySelector("svg").replaceWith(svg.node());
108}
109
110drawChart();
111</script>
Here, the word “simple” doesn’t refer to the complexity of the graph or the difficulty of constructing such a graph, but is a technical term that means that the graph cannot contain edges from a node to itself, and there cannot be multiple edges between the same pair of nodes; of course, in Obsidian it is possible to have several links between a pair of notes, but these are represented in the graph with a single edge, and similarly self-referential links are not displayed. ↩︎ ↩︎