DocumentCloud's Public Catalog
DocumentCloud is a tool for organizing and working with large documents and document collections, a document viewer that makes it easier for reporters to share source material with readers and a publicly accessible repository of primary source documents that were used in reporters' investigations.
If you're a reporter who'd like to analyze, annotate, and publish documents, contact us to find out more about getting an account. Otherwise, take a spin through our public document collection and try out some of our great analytic tools.
Search public documents and browse the notes.
Try a search for gulf oil spill to find all public documents with "gulf," "oil" and "spill" in the text or title. Your results will include a number of documents relating to the British Petroleum oil spill in the Gulf of Mexico. For each document you can see the name of the user who uploaded the document and the newsroom they work in.
If a reporter has annotated a document, you'll also see a yellow "note" indicator on the document's thumbnail. Below the thumbnail, there's a small link you can use to show the annotations that have been made on that document.
Double click on any document to read the whole document in our viewer, or read on for insights into more complex searches.
Try out our analytic tools.
DocumentCloud gives you access to a great set of analytic tools. When a reporter uploads any document, we run it through OCR, which extracts letters and words from the documents' image. But that's not all.
Search for source: "House Committee on the Judiciary".
Your results will include a handful of documents related to a DOJ report on Bush administration interrogation policies. Select all 8 of the interrogation documents and select "View Timeline" from the "Analyze" menu.
Every date reference in this collection of documents is plotted on a timeline. Hover over any dot to see the exact date in context. Click on the dot to open a document viewer to the location of that date in the text.
Close the timeline and select the "Entities" tab at the top left of the screen to view you a series of expandable lists of people, organizations, places, and terms mentioned in the documents. You'll see "John Yoo" at the top of the list, with the counter
(8) next to his name, because he's named in all eight of the documents. Select his name to refine your search, and then select the "show pages" to dispaly a thumbnail of each page where John Yoo is mentioned. Selecting a thumbnail or highlighted link will jump directly to his name on that particular page.
Enclose terms in quotes to search for a specific, multi-word phrase.
Use "NOT" or "-" to exclude a term from your search. For example these searches: geithner -madoff and geithner NOT madoff will return documents that mention "Geithner" and do not also mention "Madoff."
By default DocumentCloud will search both the title and full text of every document for for all of the words in your search term. You can, however, ask DocumentCloud to search the contents of specific fields.
Searching by Metadata Field
|title||Will search only the titles of documents. For example: title: deepwater.|
|source||Reporters have the opportunity to identify the source of each document they upload. For example: source: supreme will identify documents attributed to "U.S. Supreme Court" as well as "New York State Supreme Court."|
|description||Search for a word or phrase within all document descriptions. For example: description: manifesto.|
|account||Specify an account id to see documents published by a specific user. Notice that clicking on any user's name in your search results will automatically filter your results to include only that user's documents. For example: account: 143-james-wilkerson.|
|group||Search for all documents made public by a single newsroom. Notice that clicking on any organization name in your search results will automatically filter your results to include only that group's documents. For example: group: chicago-tribune.|
|projectid||Reporters can organize documents into as many projects as appropriate. To restrict a search to documents in one project, you need to know that project's canonical identifier or project id. DocumentCloud doesn't publish individual project id's, however. For example: projectid: 6-the-financial-crisis|
|filter||Filter documents by interesting criteria (one of "published", "unpublished", "annotated", or "popular"). For example, to view all published documents: filter: published|
Searching with Entities
For each document we store a list of entities identified by OpenCalais. These are the same entities that appear in the "Entities" tab. After searching for an entity, you can click on "show pages" to display links to the specific pages in each document that mention the person, place or thing you're searching for.
|person||The name of a human being. If you're looking for documents that reference a person with the last name of "Lee", but keep getting swamped with unrelated words, try narrowing your search to person: Lee.|
|organization||Organizations include businesses, government agencies, and other types of institutions. For example: organization: "Department of Defense".|
|place||Addresses, names of buildings and landmarks, regions, or geographical landmarks. For example: place: "World Trade Center" or place: "Gulf of Mexico".|
|term||Searches for terms might include term: "nuclear energy" or term: "gross domestic product". The results will be comparable to searching for the terms directly.|
|Complete email addresses. Documents that mention the email address of the GAO FraudNet can be found with this search: email: firstname.lastname@example.org.|
|phone||Telephone or Fax numbers. For example: phone: "(251) 441-6216".|
|city||For example: city: "New Orleans".|
|state||(Includes provinces, in countries that have provinces instead of states.) For example: state: Arizona.|
|country||For example: country: Iran.|