All entries tagged: metadata
SEO lessons from Google News: How to promote your stories, straight from the bot’s mouth
One of the keys to success in the online news game is making sure people who might be interested in your content can find it. And the most common path for those seekers goes straight through the multihued logo of search giant Google.
Google’s genius is using algorithms to determine the value of content — what search results best answer a user’s question, which ads are optimal to show on a particular page, and which articles most deserve the attention of a news consumer. The existence of those algorithms has spawned an entire industry dedicated to gaming those systems, in ways both approved and not.
Google would like to encourage the approved ways, of course, so on Tuesday the company posted the above 15-minute video of Googler Maile Ohye describing how news organizations can best ensure their stories are well represented in Google News. In case you don’t want to spend 15 minutes on it, I’ve posted a transcript of the video below. (You’ll want to see the slides Maile is referring to at several points, though.)
Here are five SEO facts I learned from it:
— Google News rates how “trusted” a news source is based on clickthrough data on its stories — another reason to create catchy headlines — but those trust levels are topic specific, so a newspaper could be more trusted as a source on some stories than on others.
— It can detect phrases like “the Los Angeles Times reported” in wire stories and promote the original L.A. Times piece among the many other versions of the story.
— While commentary and satire pieces are welcome inside Google News, they aren’t allowed to be the lead story on a given news subject — that’s a spot reserved for a hard-news story.
— Google News favors JPEGs over PNGs when selecting the pictures that go next to stories. Videos hosted on YouTube (which Google just happens to own) get a boost over videos hosted elsewhere. And you’re better off having at least three digits in the URLs of your stories.
— PageRank — the engine behind Google’s main search results — is used only “delicately” in Google News.
Here’s the full transcript: Keep reading »
N.Y. Times mines its data to identify words that readers find abstruse
If The New York Times ever strikes you as an abstruse glut of antediluvian perorations, if the newspaper’s profligacy of neologisms and shibboleths ever set off apoplectic paroxysms in you, if it all seems a bit recondite, here’s a reason to be sanguine: The Times has great data on the words that send readers in search of a dictionary.
As you may know, highlighting a word or passage on the Times website calls up a question mark that users can click for a definition and other reference material. (Though the feature was recently improved, it remains a mild annoyance for myself and many others who nervously click and highlight text on webpages.) Anyway, it turns out the Times tracks usage of that feature, and yesterday, deputy news editor Philip Corbett, who oversees the Times style manual, offered reporters a fascinating glimpse into the 50 most frequently looked-up words on nytimes.com in 2009. We obtained the memo and accompanying chart, which offer a nice lesson in how news sites can improve their journalism by studying user behavior.
All of the 25-cent words I used in the lede of this post are on the list. The most confusing to readers, with 7,645 look-ups through May 26, is sui generis, the Latin term roughly meaning “unique” that’s frequently used in legal contexts. The most ironic word is laconic (#4), which means “concise.” The most curious is louche (#3), which means “dubious” or “shady” and, as Corbett observes in his memo, inexplicably found its way into the paper 27 times over 5 months. (A Nexis search reveals that the word is all over the arts pages, and Maureen Dowd is a repeat offender.)
Corbett also notes that some words, like pandemic (#24), appear on the list merely because they are used so often. Along those lines, feckless (#17) and fecklessness (#50) appear to be the favorite confounding words of Times opinion writers. The most looked-up word per instance of usage is saturnine (#5), which Dowd wielded to describe Dick Cheney’s policy on torture.
This is mostly just interesting — quiz: how many of these words can you define? — but it’s also a reminder that news sites are sitting on a wealth of data, from popular search terms to click rates, that can help them adjust to reader preferences. So are Times scribes being asked to rein in their vocabularies? That might be a Sisyphean (#37) task, but no, Corbett merely advised reporters to “avoid the temptation to display our erudition at the reader’s expense.”
After the jump, I’ve taken the original chart of 50 words, which was compiled by director of web analytics James Robinson, and run my own spreadsheet that also calculates look-ups per use. Below that, Corbett’s memo. Keep reading »
ProPublica and NYT seek $1M to put everyone’s documents online
[Saturday was the deadline for submissions for this year's Knight News Challenge. In the coming days and weeks, we'll be looking at some of the most interesting applicants. If you know of one you think worth highlighting, let us know, via email or in the comments. —Ed.]
Two of the biggest names in journalism have applied to this year’s Knight News Challenge: The pioneering investigative-reporting non-profit ProPublica and The New York Times are seeking $1 million from the Knight Foundation to launch an online repository of primary-source documents. The project could lead to greater information sharing among news organizations and their audience. As they put it in their grant application:
Documents are the foundation of investigative journalism, but today’s newsroom is a throwaway culture. Too often, reporters gather reams of information, do their stories, then chuck rich source documents into a dusty corner, never again to see the light of day.
The project, which is called DocumentCloud, would let news organizations upload their materials for public consumption and analysis. (“Readers will also be able to quickly search, annotate and bookmark documents — and for the first time link directly to specific pages or passages.”)
The proposal relies on a piece of software called DocViewer, which was developed by the Times’ Interactive Newsroom Technologies team. The head of that team, Aron Pilhofer, recently confirmed that the Times will release DocViewer as open source “sometime after the election.” Brian Boyer, the blogger who broke that news, said the software was created by the Times for its searchable database of Hillary Clinton’s 11,000-page public schedule as first lady, which was a journalistic marvel.
In an email today, Pilhofer said the application has already made it to the second round of the News Challenge, and he explained the proposal’s provenance:
The project started with a conversation between Scott Klein, Eric Umansky (of ProPublica) and me and my boss, Marc Frons. They were interested in using our DocViewer, and we were talking about the possibility of just open sourcing the darn thing. So, we got into one of those… “Hey, wouldn’t it be cool if we could also…” sorts of conversations, and things went from there.
DocumentCloud would focus initially on New York City “because it has favorable FOI laws and a vibrant journalism and blogging community.” (The community focus is also a requirement of the News Challenge.) A consortium of media outlets, bloggers, and watchdog groups would submit documents, though the application mentions only one partner on board: the Gotham Gazette, a news website published by the Citizens Union Foundation of the City of New York. ProPublica also plans to contribute state- and federal-government documents.
For the technically inclined, DocumentCloud will run on open APIs, so readers or other news organizations could search and interact with the document database as necessary for investigative projects. “Think of it as a ‘card catalog’ of standardized metadata for primary source documents,” the application argues.
It isn’t clear if the project could or would go ahead without funding from Knight, which will award its News Challenge grants next summer. ProPublica’s $10-million annual budget is funded primarily by the Sandler Foundation. We’ve sent an email to Mike Webb, ProPublica’s director of communications, seeking more information.
The full text of the grant application is below the jump.





