User:DOI bot
From Wikipedia, the free encyclopedia
[edit] User interaction
Activate
Find out how you can use DOI bot on your own pages here. |
Bugs
Please report any bugs, or better still ideas and suggestions, here |
Emergency shutoff
Administrators: Use this button if the bot is malfunctioning. (direct link) Non-administrators can report misbehaving bots to Wikipedia:Administrators' noticeboard/Incidents. |
[edit] Function
Automatic or Manually Assisted: Automatic
Programming Language(s): PHP w/ Snoopy & BasicBot
Function Summary: Adds DOIs to citations provided using {{cite journal}}
Edit period(s) (e.g. Continuous, daily, one time run): Will do a thorough job every few months; will be available to be used on specific articles whenever requested.
Edit rate requested: 6 edits per minute. In reality the querying of other websites will be the rate limiting step.
Function Details: DOI bot only amends the parameters of {{cite journal}}.
- Adds a DOI if missing
- Replaces "id=PMID #" and "id=DOI #" with "pmid=#" and "doi=#"
- Replaces "url=http://dx.doi.org/#" with "doi=#"
- Translates all parameters (not values) to lowercase (they won't show up in the output otherwise), and replaces "authors" with "author" (common typo")
- Percent-encodes all [<>] in dois, for compliance with {{cite journal}}
- Removes "doilabel" parameters - this is now redundant
- Searches for missing parameters (except URL) and adds them if available
- If a URL already present, attempts to deduce its format (e.g. free full text, abstract, deadlink) and sets the format parameter accordingly
- If a URL is not present, follows the DOI link; if it can deduce that free access is available, sets the URL parameter to the landing page with a note on the format
- Where the {{cite doi}} template has been used, creates or expands the accompanying reference.
DOI location Logic
- The bot uses a variety of methods to locate a DOI, in the order stated:
- Search CrossRef for citation information based on available citation details
- Use the
url
parameter.- Search for a DOI within the
url
parameter. - Check that the url is active (not a 404 page containing an example DOI)
- Check the metadata of the web page linked to by the
url
parameter for a DOI - Scour the page source for a DOI
- Search for a DOI within the
- If there is no URL provided, use the Yahoo! API to search for
"title" + authors
.
With the retrieved URL:- Does the URL contain a doi? (e.g. http://example.com/view=article&id=10.1001/doi/ishere)
- If so, does the page contain data telling us we've got the right title?
- Does the URL contain a doi? (e.g. http://example.com/view=article&id=10.1001/doi/ishere)
Sites that I've seen with DOIs in the URL are only BIOONE and Blackwell publishing. The former of these encodes the title in an invisible span.
-
- Do the <meta> tags contain a dc.Identifier or citation_doi?
- If so, check the dc.title or citation_title matches the title we want.
- Is there a DOI in the page, anywhere?
- Are there lots of DOIs?
- Do any occur in association with the title? If there are any <code><br>, <p>, <li> or <td></code> tags between the title and a DOI, the DOI could refer to a different reference, and we'll have to ignore it.
- Is there a unique DOI?
- Does the DOI appear in the first 5000 characters of the document? If so, it is probably part of the document description. Any later, and it's more likely to be a reference.
- Are there lots of DOIs?
- Do the <meta> tags contain a dc.Identifier or citation_doi?