How to visualize (dendrogram) a dictionary of hierarchical items?

2024/10/8 0:35:27

This is my first time of doing visualization from hierarchical data in dictionary format with Python. Last part of the data looks like this:

d = {^2820: [^391, ^1024], ^2821: [^759, 'w', ^118, ^51], ^2822: [^291, 'o'], ^2823: [^25, ^64], ^2824: [^177, ^2459], ^2825: [^338, ^1946], ^2826: [^186, ^1511], ^2827: [^162, 'i']}

So I have indices on lists referring back to the keys (index) of the dictionary. I suppose this could be used as a base structure for the visualization, please correct me if I'm wrong. Characters on the data are "end nodes/leaves" which doesn't refer back to any index.

I have found NetworkX which possibly could be used for visualization, but I have no idea where to start with it and my data. I was hoping it would be something as simple as:

import networkx as nx
import matplotlib.pyplot as pltd = {^2820: [^391, ^1024], ^2821: [^759, 'w', ^118, ^51], ^2822: [^291, 'o'], ^2823: [^25, ^64], ^2824: [^177, ^2459], ^2825: [^338, ^1946], ^2826: [^186, ^1511], ^2827: [^162, 'i']}G = nx.Graph(d)
nx.draw(G)
plt.show()

I'm looking for hierarchical dendrogram or some sort of clustering as an output. Sorry at this point I'm not totally sure what would be the best visualization, maybe similar to this:

Dendrogram

UPDATE

Using NetworkX actually was very simple. I'm providing other simple sample data and looking for an answer if it can be visualized by dendrogram also instead of wired network graph?

# original sequence: a,b,c,d,b,c,a,b,c,d,b,c
d = {^1: ['b', 'c'], ^2: ['a', ^1, 'd', ^1], 'S': [^2, ^2]}
G = nx.Graph(d)
nx.draw_spring(G, node_size=300, with_labels=True)

NetworkX draw_spring nx.Graph

As we can see, graph show plain relations, but not hierarchy and order of the data what I'm willing to do. DiGraph gives more details, but it is still not possible to construct original sequence from it:

NetworkX draw_spring nx.DiGraph

For dendrogram apparently weight and end nodes needs to be calculated as pointed out on the first answer. For that approach data structure could be something like this:

d = {'a': [], 'b': [], 'c': [], 'd': [], ^1: ['b', 'c'], ^2: ['a', ^1, 'd', ^1], 'S': [^2, ^2]}
Answer

One idea is to use SciPy's dendrogram function to draw your dendrogram. To do so, you just need to create the linkage matrix Z, which is described in the documentation of the SciPy linkage function. Each row [x, y, w, z] of the linkage matrix Z describes the weight w at which x and y merge to form a rooted subtree with z leaves.

To demonstrate, I've created a simple example using a small dendrogram with three leaves, shown below: example dendrogram

I created this visualization with the following code:

# Load required modules
import networkx as nx
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram# Construct the graph/hierarchy
d           = { 0: [1, 'd'], 1: ['a', 'b', 'c'], 'a': [], 'b': [], 'c': [], 'd': []}
G           = nx.DiGraph(d)
nodes       = G.nodes()
leaves      = set( n for n in nodes if G.out_degree(n) == 0 )
inner_nodes = [ n for n in nodes if G.out_degree(n) > 0 ]# Compute the size of each subtree
subtree = dict( (n, [n]) for n in leaves )
for u in inner_nodes:children = set()node_list = list(d[u])while len(node_list) > 0:v = node_list.pop(0)children.add( v )node_list += d[v]subtree[u] = sorted(children & leaves)inner_nodes.sort(key=lambda n: len(subtree[n])) # <-- order inner nodes ascending by subtree size, root is last# Construct the linkage matrix
leaves = sorted(leaves)
index  = dict( (tuple([n]), i) for i, n in enumerate(leaves) )
Z = []
k = len(leaves)
for i, n in enumerate(inner_nodes):children = d[n]x = children[0]for y in children[1:]:z = tuple(subtree[x] + subtree[y])i, j = index[tuple(subtree[x])], index[tuple(subtree[y])]Z.append([i, j, float(len(subtree[n])), len(z)]) # <-- float is required by the dendrogram functionindex[z] = ksubtree[z] = list(z)x = zk += 1# Visualize
dendrogram(Z, labels=leaves)
plt.show()

There are a few key items to note:

  1. Give the d data structure, I use a NetworkX directed graph (DiGraph) . The directionality is important so we can determine which nodes are leaves (no children -> out degree of zero) and inner_nodes (two or more children -> non-zero out degree).
  2. Usually there is some weight associated with each edge in your dendrogram, but there weren't any weights in your example. Instead, I used the number of leaves in the subtree rooted at each internal node n as the weight for n.
  3. If an inner node has more than two children, you have to add "dummy" internal nodes, since each row of the linkage matrix merges two nodes together. This is why I write for y in children[1:]:, etc.

I'm guessing you may be able to simplify this code based on what your data looks like before creating the d in your example, so this may be more of a proof of concept.

https://en.xdnf.cn/q/70176.html

Related Q&A

How to unfocus (blur) Python-gi GTK+3 window on Linux

What I want to do and whyI want my window to unfocus, so the previous focused window is selected.Why? I want to interact with the previously selected window (from other programs). My current plan is: …

SyntaxError: multiple exception types must be parenthesized

I am a beginner and have a problem after installing pycaw for the audio control using python, on putting the basic initialization code for pycaw, i get the following error:- Traceback (most recent call…

Python find difference between file paths

I have a bunch of file paths, such as:path1 = "./base/folder1/subfolder" path2 = "./base/folder2/"I am trying to write a function that can give me the relative difference between th…

Update range of colorbar in matplotlib

I want to update a contourf plot within a function, which works fine. However, the range of the data changes and I therefore also have to update the colorbar. That is where I fail to do so.Please see f…

Anaconda - arrow keys not work well in python shell

I installed Anaconda3 on manjaro (with i3wm and Urxvt). When I go into python interpreter, it is OK to type python script and execute. But when key arrows are pressed to call up history, everything mes…

couldnt remove origin point in matplotlib polycollection

I have tried an example with PolyCollection from matplotlib tutorials and noticed one strange thing. I couldnt remove this points from axes origin see fig. How do I manage this?from mpl_toolkits.mplot…

How to read .odt using Python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 3 years ago.The com…

Modifying Django settings in tests

From Django docs:You shouldn’t alter settings in your applications at runtime. Forexample, don’t do this in a view:from django.conf import settingssettings.DEBUG = True # Dont do this!The only plac…

How To Use Django Cycle Tag

This hopefully is a pretty easy question. My endgoal is to be able to view my database entries in a table which I can sort via the column headers. I have read the documentation on cycle tags, but dont …

Multiple linear regression with python

I would like to calculate multiple linear regression with python. I found this code for simple linear regressionimport numpy as npfrom matplotlib.pyplot import *x = np.array([1, 2, 3, 4, 5])y = np.arra…