Generate tree from poems or any texts.
Repository for this google colab notebook
This is a project that aimes to capture the poetic language and use it as source to generate poetic visuals, as well as giving the audiecne an insight into the shape and form of poetry with a graphical representation. I wish to create a intuitive connection between words and visuals, so that the audience can compare and contrast different poems by comparing their graphical forms.
A single tree graph is gererated from each sentence in the input poem, with each vertical branch representing one semantic element (Most of the time, a word or a punctuation, but sometime a part of a word like "not" in cannot or "'s" in Monday's.). The tree's stucture is based on the sematic dependency tree of the sentense. The color of the branch is based on the part-of-speech analysis of the semantic element, such maroon for verbs and deep green for nouns. The horizontal postion of each branch matches the postion of the word in the sentence when typed out and centered, as shown below.
Then these graphs are overlapped on top of each other, forming the final tree graph of the poem.
The poem is then typed out with the same coloration rule, next to the tree graph.
This project is inspired by the Coral Cities project by Craig Taylor, where he managed to create organic forms based on geographic data and traffic data, a blend between nature and human activity. This made me realize that human generated data, can be a great source or genrative visuals that aims to look organic.
As I was learning the spaCy library and its funtionalities for another project, I accidentally learnt that the sematic structure of sentences can be parsed into a tree stucture. We are used to seeing text in a linear fashion, reading them from left to right, top to bottom, so I thought a non-linear representation of text would be a interesting concept to explore.
Reading and writing poetry has been a interest of mine for a while now, as naturally I choose to apply this idea to poems. Another reason that I choose to use poetry as data is because lines are the basic unit for poetry. The sturcture, rythmic, and length are crucial to the "poeticness" of a line, and poets are very deliberate when they are organizing words to craft them, so I think poetry would be a interesting dataset to look at based on my apporach.
Initially I was hoping to use lines as the basic unit of input data, i.e. generate one tree per line. However, despite being was techinically possible, the result is not following semantic sence since word in broken lines (incomplete sentences) cannot be fully analyzed and parsed sematically. So I decided to use sentense as the basic unit.
Since the semantic tree is separated by sentences, there will be multiple tree gnerated from a single poem, so I have decided early on that the final visual form would be a overlapped result of all trees generated by one poem. This means I need to introduce variation in the position of the branch so they do not overlap too much or too perfectly. Drawing inspiration from Coral City, I choose to use the original location in the raw data (word location when a sentence is typed out) as source for that organic variation.
To provide more insight into the text data, I decided to utilized another feature of the spaCy library -- the part-of-speed analysis per word-- to color the tree branches (corresponding to words).
For early prototyping, I made the following sketch to see if what are the possible the visual forms that are clear and aesthtically pleasing.
Most of the early design decisions produced desired results. However, the visual form for each branch has not been decided. After testing out the visual clarity and asethetics after overlapping, using the whole text from a example poem, I decided on two segments per branch, as well as the line weight and alpha value for each layer.
To give the final graph a more tree-like look, I added a function to decrease the length of the branch towards the upper end.
There is a lack of correspondence between the text and the drawing after I displace the text under the drawing for visual clarity. I tried adding the text back in on the side, but since there is no longer the location correspondance between branches and words, the relationship between poem and the tree is unclear. Therefore I decided to color the text with the same part-of-speech color coding to introducing more connection.
A. Expand the scale of the project to poetry collections or a corpus of a poet's work through life
B. Create a tighter connection between the text and the visuals, perhaps by replacing the branch line with texts?
C. Explore more visual forms for branches, such as curves, shsapes, or complex units.
D. Finetune the color palette and add multiple color palettes as themes.
This project is using the following libraries:
spaCy: a NLP library and language model in python
qahirah: a python for Cairo graphic library
Some of the poems enlisted in the examples are accessed from Poetry Foundation website, the rest are written by me.