Frequently Asked Questions

What is it?

DocumentCloud is both a repository of primary source documents and a tool for document-based investigative reporting. Think of the repository as a card catalog for primary source documents. We're building tools that accelerate the work of reporters who need to make sense of large sets of documents. (You can use it on small sets, too.)

Yes, but what is it?

Ruby (Rails, Sinatra) and JavaScript. We're using Tesseract for OCR and OpenCalais for entity extraction. Take a look at our GitHub projects and our blog to get a better sense of the tools we’re using.

What can I do with it?

Take a look at what other reporters are doing with it! And that's just the beginning. When you upload a document to DocumentCloud, you can annotate it, share it with colleagues in your newsroom or beyond your newsroom, view lists of people and places named in it, plot the dates it contains on a timeline and more.

Is there an API?


Are my documents automatically public?

No. DocumentCloud was started by journalists who understand journalism. Documents you upload aren't public until you make them public.

Can I keep documents behind a pay wall?

Sure. Other reporters who are DocumentCloud users will be able to view your public documents from our workspace, but your public facing copy can sit behind a paywall. If the general public has no access to your reporting, however, you probably aren't a good fit--DocumentCloud is intended as a public catalog.

When will it be ready?

It's ready now! If you'd like to join the beta contact us.

What does it cost?

Thanks to our funding from the Knight News Challenge, DocumentCloud is free of charge.

Who funds it?

DocumentCloud is funded by a generous grant from the John S. and James L. Knight Foundation. We were a 2009 winner of a Knight News Challenge grant. Our grant runs two years and we’ll be soon be seeking sustaining funding.

Why did the New York Times and ProPublica get grant funding?

They didn't. Journalists from the Times and ProPublica applied for the Knight News Challenge grant — and both organizations are using DocumentCloud— but Document Cloud, Inc is an independent 501c3 organization. Neither the Times nor ProPublica are receiving any money from the grant. Our three co-founders, the Times and ProPublica journalists who wrote the grant proposal, are volunteering their time.

Why would I want to share my documents?

Because it will make your documents and your reporting more findable, more useful and ultimately more popular.

We know that many journalists are already looking for better ways to share their source materials. It’s one of the reasons why most news organizations are already posting source documents alongside news stories on their websites. The trouble is there hasn’t been an easy way to make those documents useful or even findable after the story fades from the headlines.

Many other organizations (bloggers, watchdog groups, citizen journalists) are in that same boat. They have a wealth of documents but are only able to post them as individual PDF files. Again, it’s not a lack of desire; it’s a lack of available technology. This is the problem DocumentCloud intends to solve.

Can’t I already find documents using a search engine?

Search engines are very powerful; our goal is to make documents even easier to find on search engines. DocumentCloud will have information about documents and relations between them, for example what locations, people, or organizations a group of documents have in common.

How will you guarantee authenticity?

During our beta phase, we're limiting access to journalists and other researchers who have an established editorial process and a history of publishing high quality reporting. Each contributing organization takes responsibility for ensuring that the documents they upload to DocumentCloud are what they say they are.

Still have questions?

Find more answers to new questions on our blog.