Evolving Love Letters -- In Depth Description

Steve Waldman, MIT Media Lab, 1996

Introduction

Evolving Love Letters is an experiment in applying genetic metaphors and algorithms to the collaborative production of text. The emergence of computer networks as a medium of mass communication has inspired an almost religious excitement. This no doubt has a wide variety of causes. However, among the many reasons given for the enthusiasm, the possibilities of large-scale, collaborative creative expression and more "open" or "democratic" mass media are clearly near the top of the list. It is these ideas that inform Evolving Love Letters. The project is an experiment in collective creativity on a (hopefully) massively wide scale. The approach is very simple, and the content a too bit cute, but the project is intended to be a first attempt at achieving an important goal.

The internet is undoubtedly the most open medium ever of its scale. Although access to the 'net is still somewhat stratified along lines of social class and nationality, never before has it been possible for school children, office workers, and activists to produce writings, images, even sounds or movies, and to put them within easy reach of an audience in the tens of millions. This is a new thing, and it matters.

However, all is not well in paradise. Looking at the internet in actual practice, it is clear that audience, not content, has become the scarce resource. While valuable information, brilliant exposition, and the most tender poetry are all "out there", finding the good stuff is difficult, increasingly so as more and more people get on line and add to the vastness. Usenet, a panoply of free-flowing conversations that began when the 'net was young and small, may be condemned to irrelevance by the very openess that has always been its greatest virtue as the on-line population swells. Many users have abondoned the newsgroups, mumbling stuff about signal to noise ratios as they go. In reaction to this, on Usenet and elsewhere on the 'net, there has been a turn to traditional means of filtering, in order to maintain some degree of consistency, topicality, and quality. To maintain audience, one must turn away producers.

Moderated newsgroups, privately controlled web pages, and other limited access forums are no doubt very useful. For now, the most reliably valuable information sources on the 'net are certainly of this variety. But accepting this as the solution to the "signal-to-noise" problem is effectively to surrender much of what was hoped to be unique and new about the Internet. This approach seems likely in large part to recreate the free press as it has existed on paper in the on-line world. An incredible diversity of editorial perspectives will no doubt exist, just as they do on paper. However, on-line, as in the "real world", it is likely that the a relatively small number of content providers will gain disproportionately wide followings, in communities defined geographically or otherwise, and at every scale. The reason for this is simple: many forms of information become more valuable with greater exposure. In regional, professional, and even social communities, as information becomes widely known it becomes more and more critical for any particular individual to share in that knowledge -- in order to participate in the life and currency of the community, and to not seem out of touch. The range of issues that any sizeable community can address en masse will always be a small subset of the sum total concerns of all its individual members.

Some process must winnow the huge diversity of issues and ideas to the small number that can be addressed at a broad scale. Newspapers and journals -- and at larger scales television and radio -- have traditionally played important roles in this process. However, the structure of traditional media -- where relatively few individuals serve to represent the interests of entire communities -- leaves one to wonder whether in the "winnowing" too much weight is given to the particular perspectives and concerns of the producers.

The internet opens up some new possibilities. It might be practical to involve large numbers of people in an editing process coordinated over computer networks, without any small number being in a privileged (and therefore suspect) position. The question is how.

That's where Evolving Love Letters comes in. Genetic algorithms provide a promising avenue for research into a collaborative approach to filtering. The algorithms are inherently parallel and decentralized, yet they yield effective solutions to a wide variety of problems. The genetic approach is one of adopting incrementally better, but explicitly imperfect solutions. This seems appropriate for mediating collective creativity. An author or editor can never bring a piece to some unequivocal "perfection". One takes a piece of raw material and improves it, usually iteratively. The skills of a good editor involve knowing how to distinguish the good from the not-as-good. For this process to work, a writer must generate enough material, with enough of it decent, that that there is something to cull.

Cast this way, the writing and editing processes mirror genetic algorithms with their continuing dialectic between variation and selection. One might argue that writing itself proceeds through a genetic algorithm, and that the talent or art in the writing is hidden in the "operators" that produce variation and the "fitness function" that ascertains how good a particular variant is. Doubtless there are many other equally valid ways of understanding good writing. But writing-as-genetic-algorithm is an especially useful story for our purposes, because genetic algorithms lend themselves very easily to large-scale parallel implementation.

The "art" then, as we said, is in just how to go about generating useful variation, and how to pull the good from the not as good. Evolving Love Letters takes a very simple approach to these problems, the details of which will be described below. In large part this simplicity is justified by the maxim, "When you don't what else to do, do what comes easiest." Also, the choice of subject -- love letters -- was one that left it plausible a naive first attempt might succeed.

Fictional love letters have a number of qualities that make them a remarkably, if not uniquely, suitable target for an experiment such as this one. Love letters needn't be long -- we do not ask for too great an investment of work from participating individuals. They needn't be factual, so we avoid a whole set of very difficult problems in collaborative text production -- how to verify sources, inspire and incorporate good background research, check quotations, etc.

Probably more important than any of these considerations is that love letters are fun. They are interesting to people. They are about activities and issues that human beings invest tremendous personal energies into without any outside persuasion. As an added incentive for participation in the project, the server allows users to modify and mail any letters it displays to whomever they please, and as whomever they please. (In this way it resembles The Cyrano Server, a template-based love letter server on the Web.) It is hoped that users, whether seeking love or just having fun, will enjoy mailing some of the love letters that evolve, and will thus feel some stake in contributing to their continual improvement. Thus, perhaps Evolving Love Letters can get around the fact that it doesn't pay its writers and editors. This might be a significant difficulty if one were instead "Evolving News Stories".

Evolving Love Letters is a very simple, very straightforward experiment in the collaborative authorship and editing of text. Only time will tell if the approach succeeds. In the meantime, we turn to just how we evolve these letters, in detail.

How It Works

When users point their web browsers to the Evolving Love Letters home page, they are greeted by a series of very simple options. They may compose a letter, they may browse the database of all previously generated letters, or they may be seduced. The action (of course) is all in the seduction. When a user chooses to be seduced, two love letters appear. Beneath each is an arrow, with the instructions "Please choose the letter that most perfectly woos you." The user chooses one of the two suitors (which then showers him or her with kisses, while the spurned letter produces a tearful eye). Users may then improve upon either or both letters, by selecting an arrow beneath a letter's frame. The text of the letter returns, with each sentence as a clickable link. Selecting a sentence brings it into a panel where users can improve it, either by modifying it or replacing it entirely. Users can modify as many sentences as they wish.

By selecting a letter, users increase the likelihood that variants of that letter will appear again. Conversely, variants of the letter selected against are made less likely to arise. In this way apparent population of letters "evolves" towards a predominance of those that are effective at wooing users.

The Production, Selection, and Variation of Letters

When users are presented with a pair of letters to choose from, they are seeing freshly "evolved" letters. Every time Evolving Love Letters tries to seduce a hapless web surfer, it produces new variations on the letters in its database.

The letters originate as most letters do, from a human being's hand. Users compose letters and submit them to the love letters database, where they are added to the population of letters available to "seduce" users. Brand new letters cannot vary, the server only knows a single version of every sentence. However, as soon as users contribute alternative sentences, the server begins to experiment by presenting variations. Every sentence contributed can make possible many new variant letters. The server begins to produce new variations by "mixing and matching" letters in a variety of ways. These variations are never destroyed; they are always available as part of the "biodiversity" of that letter type.

The server stores letters in terms of species, individuals, genes, and alleles. Each "species" is a type of letter, which begins as a prototype penned by a human being. The species is defined by a sequence of genes and a population of individuals. To create the gene sequence, the server parses the original letter into sentences, and defines one gene per sentence. However, a "gene" is not defined by a single sentence, but by a series of analogous sentences, each of which might occupy the same position in a letter of the gene's species. Each actual sentence of the prototype letter becomes the first "allele" of a gene, the first member of a potentially long list of alternative sentences that comprise the gene. After defining a gene sequence for a new species, the server defines the first individual. An individual is simply a list of alleles -- a list of particular sentences belong to the genes of the species, in the order defined by the species' gene sequence. After defining an individual, the server defines a population for the new species, which initially includes only the first individual.

Species, individuals, and alleles are associated with probabilities. In the process of displaying letters or generating variants, members of these types are chosen stochastically, according to these probabilities. Species with high probabilities are likely to show up. Within those species, high probability individuals are most likely either to be displayed directly, or to "parent" new variations. When mutations occur, high probability alleles are more likely to appear than low probability ones. When a new species is produced, it is assigned an "average" probability -- (1 / the total number of species). Whenever a user chooses a member of that species over another potential "suitor", the selected species' probability is increased, and the "dissed" species' is diminished. Similarly, the probability associated with the selected individual is Similarly, the selected individual's likelihood of appearance is enhanced, and the spurned individual's is reduced. Finally, each allele in the successful letter increases in probability; each allele in the rejected one is diminished. In this way, the probability that whole types of letters, variations on particular letters, and specific sentences will appear is made sensitive to users' opinions.

The evolution of new letters is defined by six transformations -- "cloning", "random mutate" "selected mutate", "linear cross", "random cross" and "de novo create". Cloning does not create a new letter at all. An old individual is displayed as it was, without variation. Mutation means the replacement of an occasional sentence by an alternative sentence that is a part of the same "gene". In random mutate, a mutating sentence is chosen at random from the alleles in its gene. In selected mutate, an alternative sentence is chosen stochastically, according to allele probabilities. "Crossing" is the "mating" of two letters. In both cross operations, two individuals are chosen, again following the probabilities associated with the individuals. In a linear cross, the first portion of the new letter all derives from one "parent", while the latter portion derives completely from the other. In a random cross, each sentence of the child letter has an equal probability of having come from either parent -- the two letters are mixed randomly. The final operation for producing new letters, de novo create, no "parent" is involved in the process at all. From each gene in the sequence defined by the letters species, an allele is chosen, again according to the probabilities of the alleles in the gene. Thus a completely new letter is manufactured, but not entirely at random -- the composition of the letter is biased by allele probabilities, which in turn are sensitive to user preferences.

The technique used to increase or diminish probabilities associated with species, individuals, and alleles is a very simple "Roulette" algorithm. To increase the probability of a specific instance, a constant "delta" is added to the probability of that instance. The total probability of the entire list of alternatives is then normalized back to 1.0, by multiplying the probability of every instance by (1 / 1 + delta). Thus, the probability of a single instance is increased, at the expense of all other instances, whose probabilities decrease in proportion to their magnitude. The relative weight of all probabilities to one another remains unchanged, except for that of the increased instance. Similarly, to diminish an instance, its probability is multiplied by a constant factor (less than one). The total probability is normailzed to 1.0 by multiplying every instance by (1 / 1 - (old_probability - (factor * old_probability))).

All in all, the operation of the server is controlled by fifteen parameters, shown along with their current values below (last modified Feb 12, 1997). The optimum values for these parameters is a subject of current experimentation. These values are likely to change.

Parameter	Value
SPECIES_AUGMENTATION_DELTA	0.02
INDIV_AUGMENTATION_DELTA	0.15
ALLELE_AUGMENTATION_DELTA	0.20
SPECIES_DIMINISHMENT_FACTOR	0.70
INDIV_DIMINISHMENT_FACTOR	0.70
ALLELE_DIMINISHMENT_FACTOR	0.70
CLONE_PROB	0.40
RANDOM_MUTATE_PROB	0.10
SELECTED_MUTATE_PROB	0.10
LINE_CROSS_PROB	0.25
RANDOM_CROSS_PROB	0.10
DE_NOVO_PROB	0.05
EXPECTED_MUTATIONS	1
RANDOM_SPECIES_PERCENT	30
RANDOM_INDIV_PERCENT	40

The last three parameters merit brief explanation. The "expected mutations" parameter determines how many sentences in a letter are expected to mutate when either the random mutate or selected mutate transformations are applied. The probability that an allele will mutate is given by (1 / (letter_length)) * EXPECTED_MUTATIONS.

The "random species percentage" and "random individual percentage" arose from the observation that species and individuals that fare poorly eventually have a likelihood of reappearing or parenting that is almost zero. This seemed like a waste of available "biodiversity". A very poor individual in mating with another letter might turn out to produce very interesting offspring. Further, a whole lackluster species might be transformed into a contender via the charitable improvements by creative users. Thus, it seemed like a good idea to create a "loophole" through which poorly-faring individuals and species might reapprear. So, some small percentage of the time, when letters are chosen for display or parenting, probabilities are ignored and the species or individual are chosen entirely at random. Currently a species is chosen at random 5% of the time, an individual at random 10% of the time, and (therefore) both in tandem 0.5% of the time.

The Database

Evolving Love Letters is built on top of a reimplementation of the dtypes library, which defines set of flexible data structures and a network protocol for transparent, cross-platform transmission and storage. Binary representations of dtypes representing species, individuals, genes, and alleles are stored in a gdbm database keyed to unique object ids. The most important form these datastructures take for the present purpose is that of a probability map -- a vector of pairs, in which a floating point probability is keyed to a representation of an allele, individual, or species. All genetic operations are performed on this simple data structure.

Conclusion

Evolving Love Letters is designed to be a proof-of-concept application, not a fully functional engine for collaborative text production. However it does offer some useful features. Evolving Love Letters' database is fully browsable and sorted. Every letter and sentence ever displayed is available; the server never, ever destroys a variant once evolved. Also, users may at any time see what species ("letter types") have been most successful and within each letter type, which individuals ("variants") have done the best. Evolving Love Letters therefore addresses key goals of most "groupware" -- it provides a space for brainstorming, and helps to orgainize and filter the results of that brainstorming. However, Evolving Love Letters also contributes to the brainstorming, by producing novel letters that get sorted and filtered just as user-generated variations are.

Evolving Love Letters is not a good model for other collaborative applications in the ways that it tries to provide incentives for user participation. As mentioned earlier, it is hoped that the subject matter itself will inspire interest and some creativity on the part of the user community. This cannot be expected of many other applications where collective participation is desired. Other sorts of incentive need to be developed.

As of this writing, the location or existence of Evolving Love Letters has not been publicly announced, and very few people have tried it out. This will change shortly. Only time will tell whether the site will succeed at inspiring the participation and creativity of a large community, and converting that dispersed effort into "high quality" work -- however defined.

Acknowledgements

This project comes out of work in two courses at the MIT Media Lab. Modeling Autonomous Agents, taught by Professeor Pattie Maes, and On Being Meta, taught by Glorianna Devenport, V. Michael Bove, and Ron MacNeil. Support for my life, and therefore for this work comes from the News in the Future consortium at the Media Lab, and from my academic advisor Walter Bender who keeps me fed and sets me free to pursue these cockamamie ideas.