Tutorial: Uploading and extending data via reconciliation

Take the following steps to upload a sample dataset to WHG and perform reconciliation against our internal index of 3.6 million Wikidata place records. More information is available in our Site Guide, and the tutorial, "Preparing an LP-TSV format file from a spreadsheet."

Register and Upload

  • Register on the site. After logging in you are directed to the Data dashboard (also accessed on the main menu).
  • Click the "upload new" link
  • The QUICK START section on the right of the page explains how to use a provided Google Sheets template. There is also a downloadable .zip file that contains templates and sample datasets
  • Fill in Create Dataset form. Only Title, Label, Description, and File are required to get started.
    • Title: any string
    • Label: a unique string, 20 or fewer characters. Hint: try part of the title + '_' + your initials
    • Description: Briefly

    • The next three fields are free text, with a maximum of 500 characters. Suggestions follow, but these can be skipped until a dataset is made public, when you can consult with WHG editorial staff.
    • Creator(s): A list of names of individuals or organizations, separated by a comma or semicolon.
    • Source(s): If data was drawn from historical source(s), brief citation(s) including title, author, year.
    • Contributors(s): A list of names of individuals or organizations, with significan roles in developing the uploaded dataset.

    • URI base: leave blank unless the data is already published elsewhere and each record has a distinct URI, e.g. if a records are available at 'http://myorg.org/places/99999', the URI base to enter is http://myorg.org/places/
    • Web page: URL to a project page
    • Public?: a link here explains the process
    • File: choose a file from the zip file previously opened, or one of your own. The accepted formats, Linked Places (lpf) and LP-TSV, are explained and instructional links provided on the right of the screen.
    • Format: the correct selection should be made automatically after choosing the file, but confirm it
  • Click the "Upload" button.
    • A fairly rigorous validation of file format is performed, and errors are reported on the right side of the screen. Formatting anomalies or errors may produce unexpected errors. If you encounter a problem, please get in touch using the Contact form and we will help troubleshoot the problem.

If there are no errors, the file's contents are inserted into the WHG database and you are directed to the new dataset's portal page where you can:

  • view and edit its metadata
  • view the uploaded data in a table and a map (Browse tab)
  • initiate a reconciliation task to find prospective matches in Wikidata (Linking tab)
  • add other registered users as collaborators (Collaborators tab)

Reconciliation

WHG reconciliation services allow dataset owners (and designated co-owners) to augment their data with a) additional geometry for more complete mapping and analysis, and b) links to (i.e. matches with) Wikidata records—and via Wikidata concordances, many others including Getty TGN, GeoNames, VIAF, Library of Congress and BnF.

Those links are the essential "glue" enabling the semi-automation of the final accessioning step—reconciling to the WHG "union index" (for details, see the 'WHG Data' section of the Site Guide)

  • Navigate to the "Linking" tab of the dataset portal page and click the "Add new task" link.
  • Leave default settings in place, with Wikidata selected. Help screen links explain some of the options (more in the "Page by page" section of the Site Guide), but we leave those aside for this exercise.
  • Click the "Start" button
  • For each record in the uploaded data file, a search is performed against our indexed copy of ~3.6m Wikidata records. Up to three passes (queries) are made; if the first returns no results, the second is performed, and so on. These are labeled pass0, pass1, and pass2 in the results.
  • The task is queued and performed asynchronously. You will receive an email notification when it is complete, and you can return to this page (or simply refresh it if you haven't left)
  • A result summary is displayed, with links to review the prospective matches (hits) for each "pass" grouping.
  • Click on the first 'review' link in the list on the right to begin

Reconciliation Review

Once a reconciliation task is complete, dataset owners and designated collaborators must review the prospective matches, declaring match/no match for each. This is made easier with our Review screen.

  • The Review screen presents all of the uploaded records that had any "hits," one by one on the left side, and the "hits" found for each on the right. Records that got no hits are not in this queue.
  • The default selection for all hits is "no match." If any of the hits on the right are a 'close match' with your record, click the appropriate radio button. In either case, click the "Save" button to record your decisions. The screen then advances to the next record and the previous is removed from the queue. There is also a 'defer" option, that places that record and its prospective matches in a separate queue which you can return to later or solicit help working with.
  • Assertions of matches are saved to the WHG database as 'place_link' records, associated with your dataset's place record.
  • Additionally, if the "accept geometries in matches" box was initially checked when creating this reconciliation task (default is "yes"), any geometries in the authority record (Wikidata in this case), are saved as new place_geom records, and are now associated with your dataset's place record.
  • A help icon links to a detailed explanation of the process and the term, "closeMatch."
  • Note that if your record had a geometry to begin with, it will show up in the map as a green marker. Geometries from all of the hits appear as orange markers. Hovering over the globe symbol for a hit will highlight that record on the map. Clicking a marker will highlight the corresponding row.
  • Note that after the first save, an "Undo last save" link appears on the right side of the grey banner. This will undo the previous decision and return that record to the queue.

After reviewing all hits from all passes, affirming any matches discovered, you will have effectively augmented your dataset in the WHG database with new place_link and place_geom records. Those additions will be reflected in the Browse tab map and record details. Also, your dataset is now eligible for flagging as "public," and is ready for accessioning to the WHG "union index"—the step that links your individual records with those of other datasets.
Note: At this time, accessioning will be initiated and guided by WHG staff in consultation with contributors.