A New Architecture Of Algorithms: Could Trajectory Make Books ‘Discoverable’ At Last?


‘To Read More Books In A Similar Vein’

As the book publishing industry heads into its first major conference of the year this week — Digital Book World (hash it #DBW15 with us) in New York City — we learn now that we won’t be seeing one late-breaking major development on the program. And that’s not the fault of Mike Shatzkin and Michael Cader, whose fine DBW agenda is, in a word, huge.

This new start-up is just getting started up, but is no less welcome or intriguing.

We’re announcing in The Bookseller on the stands in London and at The FutureBook (The Bookseller’s non-paywall digital community site), that a two-year-old company in Boston called Trajectory has created what it says is a way for readers to discover the books they love — and the books that authors, agents, and publishers want them to read.

If you’re in publishing, you’re now sitting up. A bona fide, actionable answer to the dilemma termed “discoverability” for books could be industry-changing.

Is Trajectory it?

There’s about a weeklong crash course in luminous, sexy technical power of the tf-idf/cosine-similarity kind behind all this. At its outer reaches, this could get very Minority Report. (Did that billboard just ask you if you’re ready for your next book, Mr. Anderton?)

Instead, hang onto your eyeballs and let me cruise the concept for you in a few easy, non-technical steps. Then I’ll fill you in on some of the details.

(1) What IS ‘Discoverability’ In Books, Anyway?

In her “Back to Basics” blog post of  January 3, the author and determined entrepreneur Joanna Penn wrote this, emphasis mine:

That’s my aim. Grow a list of readers who love the books I love and want to read more books in a similar vein.

That is a very pure statement of “discoverability,” as the industry uses the term today. Pure Penn, too, sleeves rolled up. She may love Trajectory.

Two key points here:

  • As many have pointed out, “discoverability” is not a problem for people who read books. Our readers have enough books and ebooks on their bedside tables and Kindles to last them about six lifetimes already. The historically unprecedented, digitally triggered deluge of content glutting the market today means that you will never lack for something good to read, not even if everybody stops writing and publishing right now. (And yes, it is tempting to ask them all to just stop it — you ask them to do that, I’ll keep the car engine running.)
  • “Discoverability” is a need for the people who make books to somehow punch through that wall o’ content (which can include digital games, video, TV, film, etc., remember) and get their hands on some more buyers. While our author corps has mushroomed in size, the readership isn’t growing fast, may even be getting smaller, and is endlessly seduced by other comforts of the cloud — those electronic entertainments that keep them glued to their smartphones, right? The way to attract readers, as Penn is telling us, is to be able to show them something “in a similar vein” to what they like — help them recognize/discover the books that really match their tastes and interests.

Do you enjoy a good French Revolutionary battle scene? Who doesn’t? So as a reader, what you would like to know is that this, this, and this book has a barricade’s basket of French Revolutionary battle scenes to offer. Find that out, and suddenly, those three books have risen above the fray, right? Mais oui. You’re interested.

That’s what we need “discoverability” to do.

(2) What Do Typical Recommendation Algorithms Do Now?

You know the line. I know the line. Get your hand over your heart, we’ll stand and recite it together:

Customers who bought this item also bought…

Thank you, please be seated.

Amazon’s line has become one of the most admired in the retail business. Not unlike Netflix’s algorithmic ability to parse things through those rankings you give to films, Amazon’s ability to show you products — bookish or otherwise — has probably gotten us about as far as sales algorithms can take us, and handsomely. There’s no disrespect here for that. When it comes to comparing shopping baskets, Seattle is slick.

But these are sales algorithms. These impressive bits of cunning code are different from what Trajectory is doing.

Trajectory’s algorithms are wrangling text, the actual product of a book, the stuff on which readers’ dreams are made. And from clues in that text, Boston is deriving gauges of a book’s “sentiment,” its “intensity,” its “complexity.”

What I’m showing you below is a graphic representation of some of the “vectors” — storytelling elements, if you will — that go into a book and are rendered as computerized intelligence by Trajectory’s “natural language processing” (NLP) technology. What happens inside the minds of those machines is certainly nowhere near as colorful as these criterion-bubbles but think of each of them as a potential way to compare books and you start to see how many parallels could be drawn between one work and another to help readers get into the right page-turners.

(3) What This Can Mean

Until now, the burden of “discoverability” has fallen on sales records.

By contrast, as Trajectory works its way through talks with “retail points” — meaning online retailers and distributors — the dynamic is basically going two ways:

  • New books are being added to the Trajectory database (I’ll show you this shortly) so that more and more literature becomes available for comparisons.
  • New retail points become users of the technology so that their recommendations to readers — drawn from that deepening database of the cannon — become more about the books, themselves, than about sales records.

As Jim Bryant, Trajectory’s CEO, puts it, “No author can afford not to have every book in this database.”

I know, you may be tempted to say, “And a big veni, vidi, vici  to  you, too, buddy.”

But as biased as Bryant necessarily must be, he and his able CCO (chief content officer), Scott Beatty, say they’re convinced they’re at the opening stages of a new chapter in how we’re able to understand and array books in the marketplace.

The first wave of Trajectory’s formal partnership conversations — many going on this month — are with some of the biggest distribution and retail players in the world. I’m talking Google-Amazon-Apple level. They’re not just in the US market, but operating in international supply chains. The Ingram set. That means that the hurdles of global expansion facing many publishers and authors may make Trajectory’s guys awfully welcome.

While Trajectory doesn’t have a translation component (good translation cannot be computerized and is an art of its own), there are tests and cooperative projects under way between Trajectory and Amazon China  and Trajectory and JD.com for Chinese interests and with Spanish distributors, as well.

In such efforts, simplified Mandarin characters are used to allow searches to be conducted of English-language books parsed by the Trajectory system. Bryant notes that this transliteration faculty can go both ways — many Chinese publishers would like English-language readers to be able to find their books.

A good example? Sure: Consider one of the major subscription services in operation now — Scribd and Oyster will be at DBW in New York, for example, as will representatives from Amazon, which now offers its Kindle Unlimited subscription service.

One of the key interests for any subscription is the need to call readers’ attention to lesser known titles, right? — to help the customers find what they might like but have never heard of in a deep collection.

Now — random example, I’m making this up, not Trajectory’s guys — let’s say that the sales/borrows at a subscription service aren’t strong for Nevil Shute’s books. Do you know Shute? English, lived from 1899 to 1960, very prolific. Viking International has done a terrific job of producing a fine reissue of the Shute books. The one you might know is On the Beach, about the approach of the nuclear cloud following a detonation.

If Shute’s Viking-reissued backlist is parsed by the Trajectory system, then someone who’s fond of Herman Hesse’s Siddhartha might, in fact, find Shute’s Round the Bend as a recommendation because the algorithms Boston has put into place can compare the two books’ plot elements of extraordinary, mystic concepts, based in Eastern traditions.

This is the sort of thing that can mean a lot to a subscription program and its readers, bringing the right books to the right consumers and helping to open up and reveal the range and scope of a big selection. When it comes to it, how is anybody supposed to find those “books in a similar vein,” as Penn puts it, without the help of something that can actually get into that vein and draw some comparisons?

(4) What About Independent Authors?

It might appear at first that self-publishing authors and small presses would have the hardest time getting into the Trajectory database.

But as it happens, the first announced partnership progress — still in final talks, mind you — is with Bowker, the US ISBN agency and research firm owned by ProQuest. Bowker is dickering with Trajectory to create a mechanism by which independent writers and publishers can have their books ingested into Boston’s system, thus making those books available for comparative analysis and recommendation.

In answer to The Bookseller’s inquiry, Bowker director of identifier services Beat Barblan says, “We see tremendous value in offering authors and publishers the opportunity to process their works with the Trajectory system for matching readers to books. Their natural language processing and recommendation results are indeed impressive.”

And what’s more, there are traditionally published authors, too,  who may be very glad that the Bowker-Trajectory arrangement is coming into view so early. This comes under the rubric of, “We’re all independent authors now.”

Let’s say you’re a traditionally published author whose rights have reverted on some of your earliest titles. You now have a potentially valuable backlist.

So you have a series of strong ebook editions produced, just as author Kate Pullinger, a Canadian based in London, has done with her backlist titles including Weird Sister, which she tells me is selling best of the group.

If Pullinger can use a service like the one Bowker and Trajectory are discussing to get her self-published backlist into the Trajectory database, then her books can be discovered for their various storytelling references — revealed for how they compare to other similar works.

Trajectory, then, begins surfacing authors’ books that sales-record algorithms could never have seen because such backlist titles’ sales performance aren’t anything like what we see in frontlist sales figures.

Have A  Look At Trajectory In Action

If you go to the “Reading” tab on the top-bar navigation at Trajectory.com, what you see there is a simple graphical interface designed to show you in real time what book the system currently is “reading” and analyzing in those many vectors the company is using to have its machines “understand” books.

Check this out, it’s fun to see what it’s doing.

At the time of this writing, for example, I see the system intaking a collection, On a Raven’s Wing (HarperCollins, 2009), an anthology from Mystery Writers of America commemorating the 200th anniversary of Edgar Allen Poe’s birth.

A stack of servers with names including C-3PO, T-1000, WALL-E, Rosie, and DATA are ingesting text in real time as I watch, flashing chunks of the anthology as they go, so that you can see the textual action as it happens.

Once ingested, the system then can analyze the text in ways that include this look at Mark Twain’s Tom Sawyerwhich goes beyond a standard word cloud to reveal relationships between characters and various terms with which they’re identified.

All such data is coded to interact in creating an abstract model of a book, which can be compared to other books’ models to produce recommendations.

There’s more to learn on the Trajectory site. Many books being worked with at this point are in the public domain.

Trajectory also has the HarperCollins catalog, however, because it was among the assets of Small Demons, a much-loved but now closed start-up in the industry led by Richard Nash and Valla Valkili.

It’s hard to think that the other major publishers won’t want to have their books parsed in the Trajectory database, rather than allow HarperCollins to have such a competitive advantage. That’s my observation, not the Trajectory team’s, by the way.

What’s Ahead For This New Player

Despite the fact that Bryant and Beatty come from the tech world, some in the generally rather closed community of publishing do know them.

In 1997, these guys founded the Information Please (InfoPlease.com) that Pearson Education acquired. Before that, in 1992, they also were behind Pro CD, which is said to have originated the idea of publishing white- and yellow-page directories electronically. The company was acquired in 1996 — then as one of the 50 or so largest software outfits in the States — by Acxiom.

“We want to convince the market…to plan the effort to integrate the data into their recommendation programs.”
Jim Bryant, CEO, Trajectory

These may not be the fellows a publishing-house editorial board think of first when it comes to reaching readers with good books. But that may be just the point:

  • Publishing doesn’t know how to get inside the consumer head with its content.
  • Trajectory’s people know that you get inside a reader’s head by getting inside what he’s reading.

Eventually, Bryant says, the team in Boston wants to map its data from the ebook to the audiobook edition of a work.

It also wants to work on “exit data” that authors and publishers desperately need. “Exit data” is information on when a reader stops reading an ebook — or starts skipping whole chunks or chapters. Authors and publishers need to know where these spots are so they can better understand a reader’s reaction to the work.

It’s not hard to assume that authors will jump at this technology.

“How many degrees’ difference is there between me and Mark Twain?” is how Bryant phrases the question.

Suddenly — and on a whole panoply of bases — that question can be answered. Can Trajectory tell us how close to Samuel Langhorne Clemens’ talent and temperament a contemporary author may be? Of course not. But it can tell us where the comparative characteristics of storytelling rise and fall between Graham Greene and John Green.

But beyond the interests of the creative corps, what is Trajectory?

Beatty calls it “an intelligent network connecting publishers to retailers, libraries, schools, and new distribution channels around the world.”

Implied in that description is what Bryant says about a big step still to be taken: “We want to convince the market that’s going to use the data to plan the effort to integrate the data into their recommendation programs.”

This is critical because even if Boston has built it — even if Trajectory has, indeed, come up with a way to map the DNA of a book and recommend it accurately to a reader — those readers will not come unless the channels that serve them can utilize and make this DNA available.

  • Retailers have to “integrate this data” into how they recommend books.
  • Distributors have to “integrate this data” into how they recommend books.
  • Libraries have to “integrate this data” into how they recommend books.

The heaviest lifting may already been done by Trajectory. But if it can’t persuade industry leaders to now share the responsibility of recommendation with it, nothing reaches the readers, the glistening word relationships and concept constructions and intensity graphs just glitter away in a kind of tech temple of untapped power.

From where a journalist sits? It’s almost impossible to think that publishers won’t be the first to run, not walk, to Boston. How could you not want your books exploded into this new universe of analysis that can actually render a recommendation for a buying reader.

And when Bryant notes that Trajectory is willing to “redistribute the visualizations” that its analysis produces, it’s interesting that he seems to think independent authors might see the value in this first.

May I be crass with you? If my book drew from Trajectory’s mighty battery of machinery a “sentiment graph” that showed that my plot operated closely to that of, say, something by Joan Didion? — I’d pay Trajectory to let me print that damned sentiment graph right on the cover of my book. I’d make T-shirts showing you how like the genius of a great author Boston’s algorithms had determined my work was. I’d sky-write the passages that Trajectory had discovered offered the same “intensity” and “complexity” of something that a big-selling author had done.

Already up to our necks in content, how else are we going to communicate to readers that this book, not that book, is what they want to read next?

Just to add a little fuel to the DBW-driven energy of the week: Bryant is telling me — and I’m keeping details off the record, as agreed — of three major Chinese retail distribution partnerships that await only Beijing’s approval to be activated. The frontier, like the future, “lies before us.”

And in Trajectory’s angle on all this, the digital dynamic may finally have turned around and thrown publishing a lifeline instead of another stink bomb.

Bryant: “In one of those cases” in China, “we have an export opportunity with them to export a collection of Chinese titles.” And you think you’ve seen an overcrowded market in books.

In the final analysis, what you may be watching now is a showdown between the ideas that publishing has held close for so many decades about what makes a consumer stop everything and read a book…and what the very science that disrupted publishing now can do to guide consumers to the books they actually want to read.

And may the best algorithms win.