The invention of the “chromosound”: challenges for the development of a possible language

: This paper describes the Chromossonium , the project of an interactive environment conceived to host performances of Chromossonia , a synesthetic language for performance and composition. Chromossonia is based in a theoretical particle named Chromossound , which unites sound and color in a non-hierarchical way, and with which we expect creators to be able to compose scored pieces to be staged and performed in the Chromossonium. Following a description of how these elements are conceived, and which are its theoretical demands, the paper discusses the many potential directions of research the project offers, and the promises it brings. Then we discuss the technical challenges we found in the process of its implementation, choosing to generate the Chromossound as the output of a neural network, according to some technical choices and a defined architecture.

technology -foundations of Chromossonia" (Sinestesia, arte e tecnologia -Fundamentos da cromossonia), published in Brazil in 2002 (BASBAUM, 2002).Later, in successive collaborations with different colleagues, the idea of an instrument evolved into that of an immersive dome, a collective instrument: an interactive environment.In 2020, by invitation of curators of an art and cognition event, a new round started, with the challenge of making it finally happen.The next paragraphs will try to offer a vision how these elements are conceived, show the research possibilities that are opened by these concepts, and the technical challenges involved, which remain stubbornly in the way of their implementation.
The Chromossonium, as envisioned, is conceived as a "playable synesthetic environment".More than two decades ago, in the early stages of this research, It was described as the possibility of a synesthetic instrument (see BASBAUM, 2002: 129-41, for a basic description its structure, gestures, and playability), that could be understood in a long line of colormusic instruments (PEACOCK, 1988), with certain conceptual specificities.In successive attempts to develop it, and given that, because of the increasing facilities created by digital technology, many instruments with these characteristics were already being developed by different artists and researchers around the world -Tim Thompsons's Space Palette1 is likely to be the most fascinating of these -, the idea evolved into that of a Dome, an interactive environment, a collective instrument, opening a whole umbrella of aesthetic, technical and cognitive research possibilities.
In such environment, it would be possible to perform freely, for pure entertainment, or to play Chromossonic compositions.In the way it is now conceived, the Chromossonium would also be an environment for experimental cognitive research.Taking profit of the technical possibilities of a system hosted in an accessible network, with local or remote access, the Chromossonium will be possible to be played through a variety of interfaces.Its conceptual compositional set is based on the concept of Chromossound:

a) Chromossound
An event can be considered Chromossonic if and only if it brings together, in absolute synchrony, one sound and one colour, being the frequency of the last equal to the frequency of the former times 2 n (we're aware of physical differences between sound and light) (BASBAUM, 2002: 125-6) The Chromossound is thus a monad constituted by the melting of a color and a sound, according to criteria that secure: -non-hierarchical and complementary relations among color and sound parameters (it's not about colors accompanying sounds, nor viceversa: it's a synesthetic, hybrid element); -the possibility of scored composition.

b) Chromossonia
Chromossonia is the name given for a synesthetic compositional and performing language developed through the disposition of Chromossounds in time and space (fig. 1 shows an example of some simple experimental scores, produced 1994-95).The experiments made in the first stages of research 2 used simple full-screen Chromossounds, but Chromossonia is supposed to bring together inspiration and formal references from several music, Visual Music (ABBADO, 2020) and experimental film traditions; from montage theories (EISENSTEIN:1949) and from Contemporary Music innovative forms and notation techniques (BOUSSEUR, 1993).We also foresee the possibilities of non-linear scores -for example, the spheric 3D scores conceived by Hans-Joachim Koellreuter 3 for his last compositions; or many post-Stockhausen and post-Cage contemporary music score solutions -which are part of the research topics opened be Chromossonia; also, contemporary live-cinema practices (BASTOS and MORAN, 2021) may also be creative references.

c) Chromossonium
An instrument appropriately designed for performing Chromossonia could be named Chromossonium.At the 1990s, much before a contemporary hype around its possibilities, a Theremin was taken as an inspiration for a controller in which the player's gestures would not be overdetermined by traditional music instruments skills.Since then, many amazing technological innovations appeared, like 3D printers, touch screens, sensors, and other resources for digital lutherie (TOZZO DE DEUS, NESPOLI: 2020), opening huge possibilities for the creation of new controllers and hardware.However, in 2008, while working on a submission with Prof. Johannes Birringer, the Chromossonium concept took form of a Dome (figures 2, 3, 4), in which several people would be able to perform together, using this variety of interfaces.All these elements put together, we have enormous sonic, visual, kinesthetic, tactile, and cognitive possibilities, allowing exploration of recent advances in technology, both in the domain of gesture recognition as much as in the possibilities of applications making use of mobile devices and broadband global networks, to expand performative collaboration in experiential synesthetic domains, allowing the use of: (1) wearable interactive devices and garments; (2) mobile interfaces; (3) augmented-reality glasses; (4) 3D images; (5) mobile environment architectures.
(6) EEG wireless monitoring devices  c) However, its most interesting cognitive consequence is the possibility of bringing together people from any age or culture into a playable pre-linguistic and thus transcultural synesthetic environment, theoretically able to be played through any interface (fig.4).In an age of networks and both globalization and re-tribalization, this may be a remarkable instrument of transcultural integration.
Other relevant aspects foreseen in the project are: (1) opportunities of technological innovation, through the formalization of technical challenges fully achievable with today's available resources, in aspects not yet explored.
(2) as a result, opportunities for generating patent registration, be it in the form of creative commons public available knowledge, or in the form of commercial products: controllers, interfaces, wearables, homeentertainment for living-rooms, etc. … (3) by creating a circumscribed performative environment, in which participants engage in well-defined tasks, the Chromossonium also offers quintessential conditions for cognitive research, especially in terms of embodied cognition and enactive perception approaches, given that participants can eventually be observed and have their nervous system activity measured, allowing auspicious conditions for well-designed experiments (figures.5, 6).
(4) finally, the project brings fruitful aesthetic innovation possibilities, by relying in several traditional and contemporary experimental audiovisual, visual music, and sound-art practices, 2D and 3D images, immersive and participatory spectatorship, taken to a compositional level.

II. HOW TO MAKE IT HAPPEN?
A seductive, ambitious vision, opening a set of amazing research subjects.
However, making a vision like this really happen in the material world is not that easy.So, after years of frustrating attempts to implement the Chromossonium, while its many potential fascinating possibilities became increasingly attractive, it also became clear that a detailed project development view was necessary.Two axes circumscribe its needs: a) in one hand, hardware & software development.b) on the other hand, its experiential possibilities depend on wise aesthetic research and practice.Thus, reduction was needed.
If the basis of scientific thinking, and the projective practices that derive from it, lies on reducing an object to its most essential, treatable aspects, then it was necessary to find the most basic element of the Chromossonium building, from which the whole process could progress step by step.In terms of hardware, Dome structures with great visual projection resources and sound diffusion are not a technological problem: there are amazing solutions in the market, and it's just a matter of adequate budget.On the other hand, a new, hybrid language, like Chromossonia is expected to be, cannot exist without the possibility of being exercised: purely visual models, or pure sonic exercises do not account for what must be a bi-modal language.At the heart of the problem, the conclusion is obvious: the biggest challenge lies on software development.It became clear that we had to find the most basic element, necessary to all later developments, and this came to be understood as being the Chromossound.The development of the digitally tangible Chromossound opens the doors for the aesthetic aspects of Chromossonia and its implementation in an immersive dome structure.

III. TECHNICAL HYPOTHESIS BEING EXPLORED
When we talk about a new synesthetic language, we are not looking for simple, direct translation from one sense modality into another.If, as maintained by authors such as Merleau-Ponty (2012), Lawrence Marks (1975) and Richard Cytowic (1997), for example, we are all synesthetes, up to a certain level, all our body senses affect and modulate each other all the time in our conscious and unconscious experiences.There are recent works in psychology that explore this interaction between the senses (for example, Fulkerson, 2014;Lundborg, 2014, Ferreira et al, 2022).While discussing the relations between senses and language, Ruthrof (1997) names the relations between the senses "intersemiosis".For our present purposes, it suffices to notice the way by which, in movies, musical soundtrack influences the way we perceive images, while images certainly influence the way we listen to music in many kinds of entertainment.Also, if language certainly has evolved somehow from the direct experience of the perceived world, and there's, at some level, some analogical relations between sounds and phenomena -like in the notorious Kohler's kiki-bouba experiment (RAMACHANDRAN and HUBBARD, 2001) -, it also feedbacks over perception, so that our perceptual enaction is also modulated by language.If this is correct, such bi-directional relation can be extended to all human senses, no matter in which way the sensorium may be modeled -if as five senses, like in traditional, folk psychology, or else.Cytowic (1997), for example, includes proprioception as a 6th sense, and other models suggest even different arrangements.Is this radical holism?Yes it is.Is that synesthesia?Depends on how you approach and conceptualize it, but I assume it is.
To achieve a proper development of the Chromossound, which may retain its expected complexity and respect as much as possible its conceptual specifications, two hypotheses have been considered: 1) to create a neural network in which a MIDI input may trigger a complex process from which the Chromossound is the output.To guarantee an aesthetically interesting output, this network will be submitted to a machine-learning process, through which the output possibilities will be refined and circumscribed within a desired domain.
The problem with this hypothesis is that neural networks must know their desired outputs -like, for example, to learn to recognize a dog by images of many dogs, or to display a recommendation in a streaming application from one's previous choices.And we're still searching for how the Chromossound must really be.Nevertheless, the fact that each sound and color property may affect the whole process in complex and non-linear ways, suggests the kind of behavior one finds in neural networks.
2) the Chromossound could also be the emergent state of a complex fuzzy network.However, this leaves the results completely aleatory and unlikely to allow proper compositional conditions.
Both possibilities demand high-quality developers.At the current time, we're examining the possibilities of the first hypothesis, the neuralnetwork model.The challenges, however, seem again and again to be bigger than expected: for its technical essence, computers must process visual and sound information separately, to bring them together in a final output that would be the Chromossound.Parallel processing isn't a problem: many languages do it today.C#, our current choice, does it.
Python also does it.
However. the Chromossound, basic unity of Chromossonia, must be a complex synesthetic unity.It cannot, for example, be based in unidirectional translation between any pair of sensory modalities.There are many amazing, great Visual Music artworks around, digital or analogical, which are based on direct translation from one sense to another.A nice example of these is Stephen Malinovsky's Music Animation Machine 8 .John WHITNEY (1994) has theorized about "complementarity" between sound and visuals but kept music and color as independent variables in his compositional system (WHITNEY, 1980).Many other Visual Music artists, like Maura MCDONNELL (undated), conceived their 8 https://www.youtube.com/watch?v=EAWSonBN3Pk own processes in very personal terms, related or not to their personal synesthesia, and their processes and systems are not available to other artists or amateurs; or are on free, intuitive associations, which can't be taken as a system.We could make an analogy between the "free atonalism" found in musical compositions of the late 19th century, in composers like Mussorgsky, and Schoenberg's creation of his dodecaphonic serial system in the early 20 th century, in the search for a more structured atonalism.In relation to these many referential and inspiring works, the Chromossound is conceived not as "better art", but as a new form of art.
By virtue of this search for a complex Chromossound, decisions had to be taken.At this initial level, any decision has a strong impact over all the following developments that will be made.Given the demand for complexity, it became clear that the Chromossound should result of bidirectional interaction, co-modulation, between sound and graphic parameters.Thus, the first decision was the choice for the neural-network model, which roughly emulates some properties of our neuronal interactions, allowing the connections between inputs to have "weights" and modulate each other.Another interesting aspect of the neuralnetwork choice is that, because we don't know how the Chromossound looks like, we can let the machine do it, and then, through machine-learning techniques, we will be able to calibrate it, looking for an aesthetically rich output -and several different calibrations may be possible as also.As references, there's a whole, consolidated tradition, of Visual Music works for us to refine our perception and our taste regarding abstract dynamic audiovisual artworks, from the Magic Lantern performances of Athanasius Kircher to contemporary digital works and audiovisual performances -not to speak of Tim Thompson stunning Visual Music instruments.
Technically speaking, this level of conceptual ambition has a cost.For the sound parameters and graphic parameters to be taken to such level of bidirectional interaction in a neural network, some hard decisions, with deep conceptual consequences, had be taken.Computational thinking does not go easily with holism.To start with, an architectural model had to be established, defining the flow of information in the system (figure 7).
It is necessary that all the values assigned to such parameters could be under the same communication protocol, otherwise no computational interaction is possible.The choice has been to translate all parameters into the JSON (Javascript Object Notation) protocol, which is accepted in multiple programming languages, multiple tools, and is applied in a variety of academic and non-academic, professional applications.It is extremely light, with low computational processing cost.Once again, it is important to notice that the goal of these first steps is the develop an architecture that may later accept growing complexity between the several parameters of sound and color forms and endowed with great plasticity for the development of a rich Chromossonic language.Thus, it seemed prudent to choose a tool which may be able to absorb such level of complexity with low computational cost, since many layers are expected to be added to the neural network to accomplish the expected complexity, once these first technical challenges may be superseded.
However, for the information to reach its translation into JSON, as to feed the neural network input layer, new challenges become tangible.For example: a) what kind of input can be used to put the network in motion?b) how to severe sound and graphic parameters to process them separately, since we have not been able to find a tool capable of dedicated and synchronous processing of both forms of output.c) How to interact in real-time, performative level, with several simultaneous players, with the audiovisual information that is being processed as a response to user's inputs by means of different kinds of interfaces?d) how to train the neural network to obtain output compatible with the project's conceptual and aesthetic implications?
To face these challenges, the choice has been to reduce the Chromossound to its most basic parameters.As said above, it is not possible to make research without reducing the object of research to its most basic properties: this is especially mandatory in designing any computer application.In what respects to sound, we have chosen to preserve only pitch, duration, and amplitude; for graphics, color-frequency, duration and saturation (the form of the image being reduced, initially, to a full screen).
To deal with sound, the natural choice has been to use to the MIDI protocol, which has almost all parameters involved in a conventional musical performance already very well distinguished and codified, the performed inputs taking the form of instructions to be processed in an any MIDI synthesizer device, be it an external module or a VST.In addition, MIDI, this long-lasting protocol, created for communication between digital instruments in the 1980s, before the explosion of personal and home computers, has, over the decades, increasingly acquired a versatile character, being used for other aspects of performances in concerts, such as automation of soundboards, lighting, and other multimedia functions.
MIDI is also processed and used in programming environments that are very popular in the digital and multimedia art communities, such as PureData, Max-MSP, Ableton-live and applications like Isadora and Processing.As for the colors and other graphic parameters (shape, movement, intensity, duration, etc...) the Babylon framework was initially chosen.This is a tool often used in game design, and characterized by its lightness and versatility, since it allows the visualization of complex graphic outputs on the web without the use of sophisticated graphic cards, as well as having the ability to work with 2D and 3D spaces.This pursued first Chromossound, which combines only sound and color, in its basic parameters, was called "Primitive Chromossound".Thus, the "Primitive Chromossound" reduces the Chromossound concept to its most simple form, the primary relationship between the most basic parameters of sound and color.Once such a base is structured, we expect it will be possible to finally expand the model towards its JSON translation for neural network implementation.(Figure 7).Before any sound or visual is produced, MIDI parameters and graphic parameters are translated to JSON, so that they can feed the neural network, in which they will comodulate each other in non-linear ways.The network can also be modulated by equations defining mathematically complex, or chaotic forms.The output of this network will be the Chromossound, experienced by performers and spectators inside the Dome.The distinction between "environment" and "app", tried to distinguish between the software we're working into, and the environment we're looking for.(Image by Marcelo Lyra, based on had sketch by Sérgio Basbaum) It is worth noting some peculiarities of this model.Obviously, the nature of the computational implementation, which depends on quantifiable and well-defined parameters, requires sound and color to be separated at the origin, for their later assembly in a unit that responds to the conceptual specifications of the Chromossound.Interestingly, in the way that this implementation is being sought, there is a direct analogy with the way by which, in the 1990s, Richard Cytowic defined the processing of synesthesia in the brain, as if the data of an audiovisual television transmission were captured before being able reach the screen, and diverted from their initial purpose, to generate a new and unexpected output, the synesthete's experience: By analogy, the consensual image we see on the screen when watching television is the terminal stage of the broadcast.Someone able to intercept the transmission anywhere between the studio camera and the TV screen would be like a synesthete, sampling the transmission before it reached the screen fully elaborated.Presumably, their experience would be different from those of us viewing the screen.We can similarly propose and test the concept of synesthesia as the ´premature display of a normal cognitive process.This implies that we are all synesthetes.and that only a handful of people are consciously aware of the holistic nature of perception.(Cytowic, 1997: 27) Also, in this model, what happens is the capture of information that should determine the production of sound and color, before its arrival at the modules responsible for the output of these events, and the conduction of this information to the neural network, where they will be affected by each other in a way that, only then, will generate the output, in the form of a Chromossound.Furthermore, and there is in fact no other alternative, the Chromossound thus obtained should gain its unity from this interweaving of parameters in a neural network and from the gestalt effect generated by the synchrony of sound and color production -as in the ventriloquist illusion 9 .
What has been found, not without some surprise, is that capturing MIDI messages for use in purposes other than those for which they were originally intended is a more difficult task than expected.To get an idea of how much this issue involves technical challenges that many developers are facing today, some of the tools available for capturing such messages, the MIDI Parsers, were made available on the web in recent months, in this year 2022, few weeks, or even days, before we were looking for them.
However, MIDI remains the best option, not only for the reasons mentioned above, but also for the fact that it can be used for future performances of Cromossonia in interfaces closer to musical thought (such as, for example, the Theremin, envisioned by the project more than 25 years ago; or as in performances from graphic environments on touch screens, such as those used by Thompson -who, it is worth mentioning, has contributed in conversations to the directions of this project.)

IV. CONCLUSIONS: BE REALISTIC, ASK FOR THE IMPOSSIBLE
Soyes realistes: demandez l'impossible (Paris, May, 1968) "I would never dedicate myself to an art form that was smaller than a lifetime.What would I do after that?" (Moy Gin Ying, "Master Wong") To be able to create an entire new art language.What could be more seductive for artist-researchers to dedicate a career to?Indeed: if one departs, for example, from Ezra Pound's well known distinction of artists as "inventors" (those who create a new process) and "masters" (who exercise such process as well or better than its inventor) -these two categories being severed from all the rest of less relevant practitioners of different art forms (which is not the case to detail here) (POUND, 1961); difficulties that may emerge as the Chromossound complexity will increase.This has led us to the basic architecture here described.We believe this is a small, but consistent step towards the development of the Chromossound.Once this barrier may be crossed, then all other developments may be easier, and we'll be able to implement sound and graphic parameters, and start to establish the weights of the connections, insert the necessary layers, and create this expected synesthetic language, Chromossonia.

Financiamento
This research has support of PiPeq -Plano de Incentivo à Pesquisa da Pontifícia Universidade Católica de São Paulo.

Consentimento de uso de imagem
Figure 1 -Experiments with Chromossonia: score for "Onda 1/SI 25", using a pallet of 60 chromossounds (in BASBAUM, 2002: 152).Before any sound or visual is produced, MIDI parameters and graphic parameters are translated to JSON, so that they can feed the neural network, in which they will co-modulate each other in non-linear ways.The network can also be modulated by equations defining mathematically complex, or chaotic forms.The output of this network will be the Chromossound, experienced by performers and spectators inside the Dome.The distinction between "environment" and "app", tried to distinguish between the software we're working into, and the environment we're looking for.
(Image by Marcelo Lyra, based on had sketch by Sérgio Basbaum).

Figure 4 .
Figure 4. people playing through different interfaces.Left to right: a theremin, a tablet, wearable devices, and a smart phone (simulation).(Image by Tereza Loparic)

Fig. 7 :
Fig. 7: trying to figure-out the information flow for the primitive Chromossound: in the bottom, a player sends inputs to a control-module, which distributes this input for a MIDI VST module (left), and a Babylon graphic module (right).By the definition, those modules are related by frequency -even if sounds and light are very different kinds of waves.
if it is a matter of developing, as a result of years of interdisciplinary research, something really meaningful -then the invention of a new language, and the new creative processes it may allow, are among the finest targets one could pursue.It's not possible yet to figure out if the decisions taken are delaying too much the developments expected.Along some months of work, in the conditions offered by contemporary university duties, we've been focused on anticipating all possible

Figure 4 .
people playing through different interfaces.Left to right: a theremin, a tablet, wearable devices, and a smart phone (simulation).(Image by Tereza Loparic).Fig.5: performer being monitored with wireless EEG sensors and video (simulation).(Image: Marcelo Lyra).Fig.6: Performer with wireless EEG sensors (simulation).(Image: Marcelo Lyra).Fig.7: trying to figure-out the information flow for the primitive Chromossound: in the bottom, a player sends inputs to a control-module, which distributes this input for a MIDI VST module (left), and a Babylon graphic module (right).By the definition, those modules are related by frequency -even if sounds and light are very different kinds of waves.