Hello. I was wondering if you’ve ever noticed when you go to an American bookstore there is a very large self-help section; and it’s huge, about 10 meters worth of books. If you go to a French self-help section, that does exist in their bookstores, but it’s far smaller. You might interpret that to mean the Americans are flipped out and need a lot of help and the French are very centered.
The reality is there’s simply less people who speak the French language. The economics of publishing in smaller languages, especially those that have lower incomes, becomes very difficult for the publishing industry. A few years back, I was doing a workshop with the World Bank on poverty alleviation, and one of the keys to lifting people out of poverty is literacy, especially amongst young girls from the ages of about 5 to 11 years old. One aspect that is missing in that key period of life is there is often a lack of textbooks.
Many of the world’s languages don’t have spelling books, math books, reading books, and the like. It’s just a classic case of market failure. At the same time, I was working on some projects in development economics dealing with form-direct investment. A lot of people want to do impact-investing, investing in developing countries to small businesses. Unfortunately, these businesses have very narrow business models. They might make 6” copper nails and things like that. Things for which there is actually no market data. You can’t purchase data on the world market for some of these products and services. Again, a case of market failure.
And being a Professor of Management Science, I was thinking if there is any way to solve this market failure problem that seems to be not just for business investments but also for language learning? I discovered that there was a demand, it was a very narrow demand, but the key problem was actually the author. Authors are expensive. They want food and things like that, they want income. There is also the editors, the graphic artists, and all these people.
So, appealing to management science, which was founded by Frederick Taylor which is basically the use of mathematics to solve basic business problems, is there a way to augment labor productivity using management science techniques to overcome these market failures? Frederick Taylor is given a lot of credit for starting the notion of automation. In manufacturing, Ford, of course, took that to an extreme.
So, question, “Can we replace the author with computer algorithms? Notice the acronym EVE here, for something called a collection of programs which is an Economically Viable Entity, that basically is a box that solves both of those problems simultaneously. This is the basic problem. There is a world of information out there. Many of you have published articles, or press, etc. Maybe the person that can most benefit from that research is on the other side of the planet. They don’t read academic journals, they don’t speak English, and they are in a world that has been under-served because of their small languages. How in the world can we get that information to them but also, perhaps, there is the wealth of knowledge on their side of the world that can be used on our side of the world in the world of data.
So, what is this sustainable model that I was considering and started working on a few years back? Very simply, this: selling very expensive, high-end market research studies, completely computer-generated to subsidize the creation of language learning materials, mathematics books, etc., for the under-served languages or the long tail. This is what it actually looks like if you want to see a computer writing books. I’ve been doing this for about ten years now. We published over about 1 million titles. Most of them are high-end industry studies, but a lot of them are language-learning.
We’ve also done a number of different formats; game shows, etc., weather reports for low literacy environments, apps, Android apps, and the like. So, what are we working on exactly? It’s looking for genres of human authoring that is very formulaic, but at the same time, highly impactable. If it is formulaic then perhaps mathematical algorithms can imitate what human authors would do, had they had the opportunity to publish in those under-served areas.
So, what is not-formulaic is the question. I’m a dyslexic so I collect dictionaries. We love those kind of things. This is one of my favorites in The Devil’s Dictionary by Ambrose Bierce, “Love: A temporary insanity curable by marriage.” What is a formulaic dictionary? The poor adverbs are the most under-served words in the dictionary. They don’t get a lot of respect, and this frustrates people who are slow readers as well. You’ll notice the head word is being defined by a word in the dictionary. Does that frustrate anybody? That you have to look up the other word. Using computer algorithms– – and I’ll explain how we did this in a minute – This is what EVE, the collection of programs came up with as a definition for “rurally”:
In a rustic, agrarian, provincial, crude, or rough manner; second definition: in an insolent or boorish manner; and third: in a bucolic, pastoral, and idyllic manner, using graph-theory and a little bit of cluster analysis to came up with those definitions. With that knowledge, of course, one can create crossword puzzle books in any language because the computer doesn’t know what language it’s learning. Crossword puzzle books, test preparation guides; one thing in English as a second language learning is the use a lot of acrostic poetry. You’ll remember this from the third grade: trying to find a word that defines the word that you’re actually trying to use in the poem; and in this case, “GOD.”
We have a computer program that wrote about four million poems – sonnets, and limericks, and haiku – and we have another program that accesses an editor that then judges the quality of those poems to figure out which ones to post online. We got at 1.4 million of those. This is the acrostic for GOD, “Gentleman of Divinity” that passed through the algorithm; and the second one is LOVE, “Lean Of Vile Emotions.”
How did we do this? Leonhard Euler, the famous mathematician, came up with this notion of graph-theory. Basically, a semantic web of words that are related to each other in some kind of quantitative way. Zero, of course, doesn’t have a powerful number there because it’s used as a score for the “love” in tennis. So is not so related to the other words of love. I was approached, a few years back, based on my language literacy programs by the Gates Foundation to look at publishing in under-served areas in the field of agriculture. The world’s poorest are in the field of agriculture, and they are in remote villages.
One of our first things we did was actually install from all of the massive data out there in local languages weather-reports, crop pest and disease reports that are used in radio stations. This is FarmerVoiceRadio that was broadcasting the local weather for the first time in villages. They’ve never heard the weather report before. We got some very good feedback in terms of explanations of what do you mean by kilometers, what do you mean by degree Celsius, etc. So a lot of programming had to go in to supplement that.
We also worked with the Grameen Foundation in Uganda to create dashboards individualized for the various agricultural regions synthesizing information that could not have been done very easily if it was done manually; working with direct connections to Africa, sending in textbooks that are now in local languages. The Anthill Foundation – this is a village in Uganda where the village had not yet had textbooks before. This was the day that the first one arrived, and you can see the children’s heads pressed against each other. Definitely a demand for this kind of content. This is the most encouraging thing I found of this whole exercise, “I believe the books will motivate even those who are currently not attending school to go,” which is a very encouraging insight. This is one of our funnest projects.
This is a high-end 3D animation video game. Most languages do not have video games. This one is actually an output of an algorithm engine; it can be played for any geography in the world given the world soil conditions, climate conditions, etc.; it imitates the agronomy of any location, and we’ve got algorithms that can look at Google maps and actually get the actual terrain of the village where the game might be played. It’s for agricultural extension workers to learn how to form in areas that they might have not visited before.
In terms of hardware, this is where we started. The future will be in your pocket. Your phones should be writing books for you pretty soon. They’ll be writing PhD dissertations – talk about formulaic for anyone’s ever done it – “The effects of X on Y; a Z perspective,” speculation or discovery engines, minority reports telling us before disease breaks out in a given region what should be the plan of action or the call to action.
Word 2.0 – you’ve ever been to a Word document, and it’s blank, and you’re going, “Gosh, I’d like to write a biography about my grandfather.” You push a few buttons, do a few things, and boom! you’ve got a first draft. So maybe Word can write for us. Instead of googling, maybe we can have a content engine so when you type in “bibliography, subject X,” it actually gives you one, and it’s yours to reuse as much as you want with all rights clearance. My favorite one is a physics textbook.
When you’re a child, often, you don’t like what you’re reading. Do you agree? Physics books. But you want to be a football player. Why not have a football player physics book? Why not have a ballet physics book? There are thousands of subjects people are interested in. Why not have the context of what I’m most interested in because now we can create these types of titles; if we can do the world’s smallest languages, we can certainly do the world’s smallest hobbies. Big data is big, but I think we all should demand more. Thank you very much.