Discussion:
Plugin for fetching data from Isi-WebOfScience
Mario Castro
2008-03-08 09:45:30 UTC
Permalink
Hi all!

After a few days I've created a python plugin for getting informatin from
Isi-web of science

First of all, PYTHON IS AMAZING! Simple, powerful...I'm in love with python
:-)

Here I attach you my plugin (to be placed in $HOME/.referencer/plugins) for
referencer version
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://icculus.org/pipermail/referencer/attachments/20080308/2000e776/attachment.htm>
Mario Castro
2008-03-08 10:26:28 UTC
Permalink
Hi all!

After a few days I've created a python plugin for getting informatin from
Isi-web of science

First of all, PYTHON IS AMAZING! Simple, powerful...I'm in love with python
:-)

Here I attach you my plugin (to be placed in $HOME/.referencer/plugins) for
referencer version 1.1.1

It can be improved in many ways. For instance if function getNumberOfRecords
returns a value equal to zero, a warning window could be open with
information about that. similarly, if it returns a number greater than 1, it
would be highly interesting to obtain a window with all the possibilities
and pick one with the mouse, but I don't know how to create a new window

Until next version, enjoy it!


#!/usr/bin/env python

# Get info from isi-web of science from title/author/year fields (any or all
of them)
# Mario Castro, 2008


import os
import referencer
from referencer import _
import sys, urllib2, urllib

from xml.dom import minidom

referencer_plugin_info = []
referencer_plugin_info.append (["longname", _("Get info from ISI Web of
Science")])
referencer_plugin_info.append (["action", _("Get info from ISI Web")])
referencer_plugin_info.append (["tooltip", _("Get info from ISI Web of
Science")])
referencer_plugin_capabilities = []
referencer_plugin_capabilities.append ("document_action")


def get_fields (doc, field, separator):
value = doc.getElementsByTagName(field)
output=''
if len(value) == 0:
return ""
else:
length=len(value)
if (len(value[0].childNodes) == 0):
return ""
else:
#for items in value:
for index in range(length-1):

output+=value[index].childNodes[0].data.encode("utf-8")+separator
return output+value[length-1].childNodes[0].data.encode("utf-8")

def get_last_field (doc, field):
value = doc.getElementsByTagName(field)
if len(value) == 0:
return ""
else:
if (len(value[0].childNodes) == 0):
return ""
else:
for items in value:
last=items.childNodes[0].data.encode("utf-8")
return last

def get_field (doc, field):
value = doc.getElementsByTagName(field)
if len(value) == 0:
return ""
else:
if (len(value[0].childNodes) == 0):
return ""
else:
return value[0].childNodes[0].data.encode("utf-8")


def get_attribute_from_field (doc, field, attr):
value = doc.getElementsByTagName(field)
return value[0].getAttribute(attr)

def getNumberOfRecords (document):
title = document.get_field("title")
year = document.get_field ("year")
author= document.get_field ("author")

ti=urllib.urlencode([('','('+title+')')])
ye=urllib.urlencode([('','('+year+')')])
au=urllib.urlencode([('','('+author+')')])

url0='
http://estipub.isiknowledge.com/esti/cgi?databaseID=WOS&rspType=xml&method=search&firstRec=1&numRecs=1&query=TI'+ti+'&PY'+ye+'&AU'+au
data0 = referencer.download (_("Obtaining data from ISI-WebOfScience"),
_("Fetching number of ocurrences for %s/%s/%s") % (author,title,year),
url0);
xmldoc0 = minidom.parseString(data0)
recordsFound=get_field(xmldoc0,"recordsFound")
return recordsFound

def getAndSetFields(document):

title = document.get_field("title")
year = document.get_field ("year")
author= document.get_field ("author")

page_orig=document.get_field("pages")
journal_orig=document.get_field("journal")
volume=document.get_field("volume")

ti=urllib.urlencode([('','('+title+')')])
ye=urllib.urlencode([('','('+year+')')])
au=urllib.urlencode([('','('+author+')')])

url='
http://estipub.isiknowledge.com/esti/cgi?databaseID=WOS&SID=Q1mNFhCECOk6c8aELLh&rspType=xml&method=searchRetrieve&firstRec=1&numRecs=1&query=TI'+ti+'&PY'+ye+'&AU'+au
data = referencer.download (_("Obtaining data from ISI-WebOfScience"),
_("Fetching data for %s/%s/%s") % (author,title,year), url);
xmldoc = minidom.parseString(data)
authors=get_field(xmldoc,"primaryauthor")
more_authors=get_fields(xmldoc,"author",' and ')
if(len(more_authors)>0):
authors+=' and '+more_authors
abstract=get_field(xmldoc,"p")
keywords=get_fields(xmldoc,"keyword",', ')
journal=get_field(xmldoc,"source_title")
doi=get_last_field(xmldoc,"article_no")
pages=get_field(xmldoc,"bib_pages")
title_isi=get_field(xmldoc,"item_title")
year_isi=get_attribute_from_field(xmldoc,"bib_issue","year")
volume_isi=get_attribute_from_field(xmldoc,"bib_issue","vol")

if (len(year)==0 and len(year_isi)>0):
document.set_field("year",year_isi)
if (len(volume)==0 and len(volume_isi)>0):
document.set_field("volume",volume_isi)
if (len(title)>0):
document.set_field("title",title_isi)
if (len(authors)>0):
document.set_field("author",authors)
if (len(doi)>0):
document.set_field("doi",doi)
if (len(journal_orig)==0 and len(journal)>0):
document.set_field("journal",journal)
if (len(page_orig)<len(pages) and pages!='-'):
document.set_field("pages",pages)
if (len(abstract)>0):
document.set_field("abstract",abstract)
if (len(keywords)>0):
document.set_field("keywords",keywords)

def do_action (documents):
empty = True
s = ""
assigned_keys = {}
for document in documents:
rec=getNumberOfRecords(document)
if (rec=='1'):
getAndSetFields(document)

return True
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://icculus.org/pipermail/referencer/attachments/20080308/6d88f323/attachment.htm>
Michael Banck
2008-03-08 11:05:16 UTC
Permalink
Hi,

first off, a Web of Science plugin is wonderful!
Post by Mario Castro
Here I attach you my plugin (to be placed in $HOME/.referencer/plugins) for
referencer version 1.1.1
The way you included it inline seems to have introduced some spurious
extra line wrappings. It is probably better to attach it or put it up
somewhere on the web for people to grab it.

I tried to fix the line wrapping, but all I get when I select a
reference and click on the plugin icon is this on the console:

Waiting...
openCB: result not OK

(referencer:16260): libgnomevfsmm-WARNING **: gnome-vfsmm
Async::Handle::cancel(): This method currently leaks memory.
waitForFlag completed due to transferfail
referencer_download: got 0 characters
PythonPlugin::doAction: NULL return value

And a popup saying

"Exception: xml.parsers.expat.ExpatError

Module: web-of-science
Explanation: no element found: line 1, column 0"
Post by Mario Castro
url='
http://estipub.isiknowledge.com/esti/cgi?databaseID=WOS&SID=Q1mNFhCECOk6c8aELLh&rspType=xml&method=searchRetrieve&firstRec=1&numRecs=1&query=TI'+ti+'&PY'+ye+'&AU'+au
I wonder, the above URL looks like a session id, can this work for
anybody else than you?


Michael
Mario Castro
2008-03-08 12:43:05 UTC
Permalink
Post by Michael Banck
first off, a Web of Science plugin is wonderful!
Thanks, but it still needs to be improved
Post by Michael Banck
The way you included it inline seems to have introduced some spurious
extra line wrappings. It is probably better to attach it or put it up
somewhere on the web for people to grab it.
I don't know how to attach it, but here I try it again gzipped.
Post by Michael Banck
I tried to fix the line wrapping, but all I get when I select a
Waiting...
openCB: result not OK
(referencer:16260): libgnomevfsmm-WARNING **: gnome-vfsmm
Async::Handle::cancel(): This method currently leaks memory.
waitForFlag completed due to transferfail
referencer_download: got 0 characters
PythonPlugin::doAction: NULL return value
And a popup saying
"Exception: xml.parsers.expat.ExpatError
Module: web-of-science
Explanation: no element found: line 1, column 0"
This may be due to the copy and paste process. Try with the one attached to
this message
Post by Michael Banck
url='
http://estipub.isiknowledge.com/esti/cgi?databaseID=WlOS&SID=Q1mNFhCECOk6c8aELLh&rspType=xml&method=searchRetrieve&firstRec=1&numRecs=1&query=TI'+ti+'&PY'+ye+'&AU'+au<http://estipub.isiknowledge.com/esti/cgi?databaseID=WOS&SID=Q1mNFhCECOk6c8aELLh&rspType=xml&method=searchRetrieve&firstRec=1&numRecs=1&query=TI%27+ti+%27&PY%27+ye+%27&AU%27+au>
I wonder, the above URL looks like a session id, can this work for
anybody else than you?
That number should be random, but I have founded that after the query that
number is changed by the server, so it is irrelevant.

Note that you need a subscription to isi-wos in order to get this plugin
work. This is automatic if your ip belongs to the range of your university.
However, I have checked that if you configure the proxy options in
referencer works perfectly with my plugin.

Hope this helps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://icculus.org/pipermail/referencer/attachments/20080308/de12d5fc/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: isiwos.py.gz
Type: application/x-gzip
Size: 1338 bytes
Desc: not available
URL: <http://icculus.org/pipermail/referencer/attachments/20080308/de12d5fc/attachment.gz>
Michael Banck
2008-03-08 13:44:26 UTC
Permalink
Post by Mario Castro
Post by Michael Banck
The way you included it inline seems to have introduced some spurious
extra line wrappings. It is probably better to attach it or put it up
somewhere on the web for people to grab it.
I don't know how to attach it, but here I try it again gzipped.
Ah, worked much better now!
Post by Mario Castro
Post by Michael Banck
I wonder, the above URL looks like a session id, can this work for
anybody else than you?
That number should be random, but I have founded that after the query that
number is changed by the server, so it is irrelevant.
Note that you need a subscription to isi-wos in order to get this plugin
work. This is automatic if your ip belongs to the range of your university.
However, I have checked that if you configure the proxy options in
referencer works perfectly with my plugin.
Why is the plugin not hooking into the "Get Metadata" mechanism? On the
other hand, the plugin seems to require title, author and year, so if
all you have is a DOI, you'd need to hit Get Metadata twice I guess,
which is not very intuitive, either.


Michael
Mario Castro
2008-03-08 14:42:12 UTC
Permalink
Post by Michael Banck
Why is the plugin not hooking into the "Get Metadata" mechanism? On the
other hand, the plugin seems to require title, author and year, so if
all you have is a DOI, you'd need to hit Get Metadata twice I guess,
which is not very intuitive, either.
I agree with you but I'm new in referencer and in python. I don't know how
to do it. Maybe someone with more experience could tell us how.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://icculus.org/pipermail/referencer/attachments/20080308/8272d936/attachment.htm>
John Spray
2008-03-09 14:15:25 UTC
Permalink
Post by Michael Banck
Why is the plugin not hooking into the "Get Metadata"
mechanism? On the
other hand, the plugin seems to require title, author and
year, so if all you have is a DOI, you'd need to hit Get
Metadata twice I guess, which is not very intuitive, either.
I agree with you but I'm new in referencer and in python. I don't know
how to do it. Maybe someone with more experience could tell us how.
The way the "Get metadata" stuff is structured is such that each plugin
publishes a list of ID formats that it understands (pubmed, doi, etc),
and when tyring to get the metadata for a paper referencer tries to
resolve an ID to the metadata.

What I don't understand about this web of science plugin is that it only
seems to do anything when you already have at least the author, title
and year: so what is web of science giving you in addition to that? In
any case, the get metadata structure is designed so that referencer can
call plugins automatically when adding documents, and clearly when some
metadata is already required this isn't the right time to invoke it.

Perhaps you can describe the use-case/workflow in which the plugin would
be used; when should the plugin be invoked on behalf of the user?

John
Michael Banck
2008-03-09 18:45:47 UTC
Permalink
Post by John Spray
What I don't understand about this web of science plugin is that it only
seems to do anything when you already have at least the author, title
and year: so what is web of science giving you in addition to that? In
any case, the get metadata structure is designed so that referencer can
call plugins automatically when adding documents, and clearly when some
metadata is already required this isn't the right time to invoke it.
This is probably because you cannot easily access a specific publication
in WOS - you always need to issue a search and hope you get only one
hit. If WOS supports searching based on DOI, this would be preferred.

What the WOS plugin gives you is additional information like volume,
journal and additional authors, plus extra fields like abstract etc.


Michael
Mario Castro
2008-03-09 19:29:10 UTC
Permalink
Post by John Spray
Post by Michael Banck
Why is the plugin not hooking into the "Get Metadata"
mechanism? On the
other hand, the plugin seems to require title, author and
year, so if all you have is a DOI, you'd need to hit Get
Metadata twice I guess, which is not very intuitive, either.
I agree with you but I'm new in referencer and in python. I don't know
how to do it. Maybe someone with more experience could tell us how.
The way the "Get metadata" stuff is structured is such that each plugin
publishes a list of ID formats that it understands (pubmed, doi, etc),
and when tyring to get the metadata for a paper referencer tries to
resolve an ID to the metadata.
What I don't understand about this web of science plugin is that it only
seems to do anything when you already have at least the author, title
and year: so what is web of science giving you in addition to that? In
any case, the get metadata structure is designed so that referencer can
call plugins automatically when adding documents, and clearly when some
metadata is already required this isn't the right time to invoke it.
The metadata feature only provides the first author, but not the whole list
of them. Moreover, the isiwos plugin provides abstract and keywords. This is
not relevant for citation purposes but it is very interesting if you try to
use referencer as a simple and fast article database with the PDF linked to
the reference (what makes referencer really useful)
Post by John Spray
Perhaps you can describe the use-case/workflow in which the plugin would
be used; when should the plugin be invoked on behalf of the user?
The plugin should be used whenever you are interested in the whole list of
authors and if you want more information about the paper. Actually I was
thinking about exploiting some of the extra information provided by isi (for
instance the list of articles cited by this one).

Is suggest that the integration with referencer could be customized, maybe,
in the preferences menu, by clicking at a checkbox with a question like
"Would you like referencer to obtain data from isi-web of science?" and this
allowing to choose between crossref or isi-web

If you are interested in what isi provides here I attach you typical query.
You can see at the end of the xml file a list of citations (the numbers are
the codes of such articles). If you find more information useful I can
modify the plugin to accomodate it


<?xml version="1.0" encoding="UTF-8" ?>
<response>
<sessionID>V2oNPIc3OaOcKc at BEe3</sessionID>
<searchResults>
<queryID>1</queryID>
<occurancesFound>8</occurancesFound>
<recordsFound>1</recordsFound>
<recordsSearched>39964000</recordsSearched>
</searchResults>
<records>
<REC inst_id="1" recid="157959855" hot="yes" sortkey="3101145488"
timescited="0" sharedrefs="0" inpi="false">
<item issue="157959718" recid="157959855" coverdate="200707"
sortkey="3101145488" refkey="1482335" dbyear="2007" dbweek="41">
<ut>000247812800035</ut>
<i_ckey>CUER0427070146ER</i_ckey>
<i_cid>0116230126</i_cid>
<source_title>EUROPEAN PHYSICAL JOURNAL-SPECIAL TOPICS</source_title>
<source_abbrev>EUR PHYS J-SPEC TOP</source_abbrev>
<item_title>Universal non-equilibrium phenomena at submicrometric surfaces
and interfaces</item_title>
<sq>C6983J0</sq>
<bib_id>146: 427-441 JUL 2007</bib_id>
<article_nos count="1">
<article_no>DOI 10.1140/epjst/e2007-00197-4</article_no>
</article_nos>
<bib_pages begin="427" end="441" pages="15">427-441</bib_pages>
<bib_issue year="2007" vol="146"/>
<doctype code="@">Article</doctype>
<editions full="SCI"/>
<languages count="1">
<primarylang code="EN">English</primarylang>
</languages>
<authors count="5">
<primaryauthor>Cuerno, R</primaryauthor>
<fullauthorname>
<AuRole>Author, Reprint Author</AuRole>
<AuLastName>Cuerno</AuLastName>
<AuFirstName>R.</AuFirstName>
<AuCollectiveName>Cuerno, R.</AuCollectiveName>
</fullauthorname>
<author key="1116072">Castro, M</author>
<fullauthorname>
<AuRole>Author</AuRole>
<AuLastName>Castro</AuLastName>
<AuFirstName>M.</AuFirstName>
<AuCollectiveName>Castro, M.</AuCollectiveName>
</fullauthorname>
<author key="4940393">Munoz-Garcia, J</author>
<fullauthorname>
<AuRole>Author</AuRole>
<AuLastName>Munoz-Garcia</AuLastName>
<AuFirstName>J.</AuFirstName>
<AuCollectiveName>Munoz-Garcia, J.</AuCollectiveName>
</fullauthorname>
<author key="2325145">Gago, R</author>
<fullauthorname>
<AuRole>Author</AuRole>
<AuLastName>Gago</AuLastName>
<AuFirstName>R.</AuFirstName>
<AuCollectiveName>Gago, R.</AuCollectiveName>
</fullauthorname>
<author key="7231501">Vazquez, L</author>
<fullauthorname>
<AuRole>Author</AuRole>
<AuLastName>Vazquez</AuLastName>
<AuFirstName>L.</AuFirstName>
<AuCollectiveName>Vazquez, L.</AuCollectiveName>
</fullauthorname>
</authors>
<keywords_plus count="10">
<keyword>CHEMICAL-VAPOR-DEPOSITION</keyword>
<keyword>THIN-FILM GROWTH</keyword>
<keyword>ION-SPUTTERED SURFACES</keyword>
<keyword>AEOLIAN SAND RIPPLES</keyword>
<keyword>VICINAL SURFACES</keyword>
<keyword>MORPHOLOGICAL INSTABILITIES</keyword>
<keyword>NONLINEAR EVOLUTION</keyword>
<keyword>DYNAMICS</keyword>
<keyword>MODEL</keyword>
<keyword>EQUATION</keyword>
</keywords_plus>
<reprint>
<rp_author>Cuerno, R</rp_author>
<rp_address>Univ Carlos III Madrid, Dept Matemat, Leganes 28911,
Spain</rp_address>
<rp_organization>Univ Carlos III Madrid</rp_organization>
<rp_suborganizations count="1">
<rp_suborganization>Dept Matemat</rp_suborganization>
</rp_suborganizations>
<rp_city>Leganes</rp_city>
<rp_country>Spain</rp_country>
<rp_zips count="1">
<rp_zip location="AC">28911</rp_zip>
</rp_zips>
</reprint>
<research_addrs count="6">
<research>
<rs_address>Univ Carlos III Madrid, Dept Matemat, Leganes 28911,
Spain</rs_address>
<rs_organization>Univ Carlos III Madrid</rs_organization>
<rs_suborganizations count="1">
<rs_suborganization>Dept Matemat</rs_suborganization>
</rs_suborganizations>
<rs_city>Leganes</rs_city>
<rs_country>Spain</rs_country>
<rs_zips count="1">
<rs_zip location="AC">28911</rs_zip>
</rs_zips>
</research>
<research>
<rs_address>Univ Carlos III Madrid, GISC, Leganes 28911, Spain</rs_address>
<rs_organization>Univ Carlos III Madrid</rs_organization>
<rs_suborganizations count="1">
<rs_suborganization>GISC</rs_suborganization>
</rs_suborganizations>
<rs_city>Leganes</rs_city>
<rs_country>Spain</rs_country>
<rs_zips count="1">
<rs_zip location="AC">28911</rs_zip>
</rs_zips>
</research>
<research>
<rs_address>Unvi Pontificia Comillas, Escuela Tecn Super Ingenieria, Madrid
28015, Spain</rs_address>
<rs_organization>Unvi Pontificia Comillas</rs_organization>
<rs_suborganizations count="1">
<rs_suborganization>Escuela Tecn Super Ingenieria</rs_suborganization>
</rs_suborganizations>
<rs_city>Madrid</rs_city>
<rs_country>Spain</rs_country>
<rs_zips count="1">
<rs_zip location="AC">28015</rs_zip>
</rs_zips>
</research>
<research>
<rs_address>Unvi Pontificia Comillas, GISC, Madrid 28015, Spain</rs_address>
<rs_organization>Unvi Pontificia Comillas</rs_organization>
<rs_suborganizations count="1">
<rs_suborganization>GISC</rs_suborganization>
</rs_suborganizations>
<rs_city>Madrid</rs_city>
<rs_country>Spain</rs_country>
<rs_zips count="1">
<rs_zip location="AC">28015</rs_zip>
</rs_zips>
</research>
<research>
<rs_address>Univ Autonoma Madrid, Ctr Micro Anal Mat, E-28049 Madrid,
Spain</rs_address>
<rs_organization>Univ Autonoma Madrid</rs_organization>
<rs_suborganizations count="1">
<rs_suborganization>Ctr Micro Anal Mat</rs_suborganization>
</rs_suborganizations>
<rs_city>Madrid</rs_city>
<rs_country>Spain</rs_country>
<rs_zips count="1">
<rs_zip location="BC">E-28049</rs_zip>
</rs_zips>
</research>
<research>
<rs_address>CSIC, Inst Ciencia Mat, E-28049 Madrid, Spain</rs_address>
<rs_organization>CSIC</rs_organization>
<rs_suborganizations count="1">
<rs_suborganization>Inst Ciencia Mat</rs_suborganization>
</rs_suborganizations>
<rs_city>Madrid</rs_city>
<rs_country>Spain</rs_country>
<rs_zips count="1">
<rs_zip location="BC">E-28049</rs_zip>
</rs_zips>
</research>
</research_addrs>
<abstract avail="Y" count="1">
<p>The recent widespread interest in processes occurring at micro and
nanometric scales has increased the physical relevance of the surfaces and
interfaces constituting system boundaries, both at and far from equilibrium.
In the latter case, universal properties occur, such as scale invariance
(surface kinetic roughening), surface pattern formation or domain
coarsening. However, descriptions of these systems feature limited
predictive power when based merely on universality principles. We review
examples from Materials Science at nano and submicrometric scales, that
underlie the importance of describing growing surfaces by means of
(phenomenological) constitutive laws, in order to correctly describe the
rich behaviors experimentally found across many different systems.
Additionally, this approach provides new generic models that are also of
interest in the wider contexts of Pattern Formation and Non-Linear
Science.</p>
</abstract>
<refs count="89">
<ref>148629326</ref>
<ref>125721540</ref>
<ref>145155923</ref>
<ref>68181038</ref>
<ref>71957469</ref>
<ref>89109533</ref>
<ref>66336735</ref>
<ref>90450674</ref>
<ref>131454756</ref>
<ref>115907348</ref>
<ref>157959856</ref>
<ref>143850959</ref>
<ref>157959857</ref>
<ref>142606416</ref>
<ref>94625184</ref>
<ref>118138215</ref>
<ref>158689958</ref>
<ref>91500052</ref>
<ref>148629327</ref>
<ref>127394965</ref>
<ref>131133520</ref>
<ref>137084549</ref>
<ref>147848065</ref>
<ref>127141117</ref>
<ref>134982443</ref>
<ref>150389938</ref>
<ref>111619292</ref>
<ref>121013994</ref>
<ref>150068534</ref>
<ref>133472626</ref>
<ref>117396637</ref>
<ref>119847794</ref>
<ref>153614905</ref>
<ref>111140497</ref>
<ref>118138088</ref>
<ref>118772083</ref>
<ref>94983617</ref>
<ref>88602043</ref>
<ref>98054976</ref>
<ref>32213345</ref>
<ref>117506087</ref>
<ref>134425946</ref>
<ref>85508180</ref>
<ref>157959858</ref>
<ref>125906796</ref>
<ref>144177251</ref>
<ref>102556836</ref>
<ref>124728878</ref>
<ref>127584254</ref>
<ref>134705051</ref>
<ref>121433234</ref>
<ref>149108751</ref>
<ref>153515492</ref>
<ref>92828185</ref>
<ref>112748415</ref>
<ref>157959860</ref>
<ref>114398202</ref>
<ref>130479470</ref>
<ref>101384092</ref>
<ref>65602191</ref>
<ref>106291475</ref>
<ref>105183230</ref>
<ref>106291478</ref>
<ref>147616027</ref>
<ref>132326972</ref>
<ref>149551608</ref>
<ref>133675491</ref>
<ref>157959861</ref>
<ref>124397617</ref>
<ref>116385779</ref>
<ref>114016043</ref>
<ref>94881776</ref>
<ref>115719538</ref>
<ref>109033010</ref>
<ref>16357259</ref>
<ref>23971371</ref>
<ref>82013815102</ref>
<ref>146761103</ref>
<ref>152494564</ref>
<ref>108677156</ref>
<ref>149016606</ref>
<ref>107401065</ref>
<ref>111746594</ref>
<ref>126855141</ref>
<ref>114398205</ref>
<ref>124247665</ref>
<ref>142444393</ref>
<ref>101620809</ref>
<ref>98259967</ref>
</refs>
</item>
</REC>

</records>
</response>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://icculus.org/pipermail/referencer/attachments/20080309/c3e13b97/attachment.htm>
John Spray
2008-03-09 20:20:48 UTC
Permalink
Post by Mario Castro
The metadata feature only provides the first author, but not the whole
list of them. Moreover, the isiwos plugin provides abstract and
keywords. This is not relevant for citation purposes but it is very
interesting if you try to use referencer as a simple and fast article
database with the PDF linked to the reference (what makes referencer
really useful)
The issue of getting only the first author only applies to the crossref
plugin, not to the "Get metadata" framework in general. For instance,
pubmed gives more complete information and I have a similar one for the
ADS database. It seems like you're suggesting using crossref to resolve
a DOI to title/author and then filling out the rest from WOS. Are you
completely certain that it is not possible to look up an article by DOI
on WOS in a single step? That would seem like a pathological omission
from their interface.

The XML you have included below shows the DOI as the article_no tag:
perhaps you can search on this field in the same way as you currently do
on author and title.

Regards,
John
Mario Castro
2008-03-09 20:48:23 UTC
Permalink
Post by John Spray
The issue of getting only the first author only applies to the crossref
plugin, not to the "Get metadata" framework in general. For instance,
pubmed gives more complete information and I have a similar one for the
ADS database. It seems like you're suggesting using crossref to resolve
a DOI to title/author and then filling out the rest from WOS. Are you
completely certain that it is not possible to look up an article by DOI
on WOS in a single step? That would seem like a pathological omission
from their interface.
Unfortunately the queries are not well documented and from the web page
there is no field for doi (although, as John points out, it gives that
information in the reply).
Post by John Spray
perhaps you can search on this field in the same way as you currently do
on author and title.
As I mentioned above, I don't know how...but I will continue trying it
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://icculus.org/pipermail/referencer/attachments/20080309/ab3a951b/attachment.htm>
Mario Castro
2008-03-09 23:22:30 UTC
Permalink
Hi all

I've seen this comment in isi web of science (trrough google,of course :-) )

*****************

*Web of Science* ? External Links

After working out and implementing the various internal links that provide
useful navigation for *Web of Science* users, the next natural step was to
explore the potential for establishing links to data external to ISI
products. Before proceeding, we conducted extensive research into the
potential of various standard identifiers that might aid in this venture.
Each of the candidates (Digital Object Identifier [DOI <http://www.doi.org/>],
Serial Item and Contribution Identifier [SICI ? NISO Z39.56 1996], and
Publisher Item Identifier [PII]) was found to be lacking for various reasons
that will not be expanded on in this forum. The decision was made to use
internal keys to move ahead with links. With the understanding that we might
come back later and reconsider this stance, we felt most comfortable
initially with our ability to create external links based on the same
identifiers that have worked so well for our internal links.

******************
So It seems that isi does not support doi as a key index. Bad news.

However, I've been working around with my plugin and it's pretty useful in
some cases. For instance, you can drag and drop a pdf for an old article
(hence without doi) and write manually some words from the title, the year
and the first author. Then clickin in the plugin button.....you obtain the
rest of the information as desired.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://icculus.org/pipermail/referencer/attachments/20080310/ad4760ee/attachment.htm>
David Coeurjolly
2008-03-18 09:42:11 UTC
Permalink
Dear MArio,

I'm really interested in your WOS plugin but I have some questions:

- proxy setting: in my university, I have to use a proxy to access to
WOS. Good point: I can configure the proxy in referencer preferences ;).
However, could it be possible to have visual feed back ("connection
refused, ...") if the server response is:

<response>
<error code="Server.authentication">No matches returned for IP
Address</error>
</response>


- When I'm fetching metadata, the returned XML code is something like

<response>
<sessionID>V1NGmMPC64GiDOB2a69</sessionID>
<searchResults>
<queryID>1</queryID>
<occurancesFound>45201</occurancesFound>
<recordsFound>44667</recordsFound>
<recordsSearched>33028999</recordsSearched>
</searchResults>
<recordIDs>
<key>000253340900047</key>
</recordIDs>
</response>

and as you can see, I obtain recordIDs but not the records themselves...
Then, I get an error (gtk error window):

Exception: <class 'xml.parsers.expat.ExpatError'>

Module: isiwos
Explication: no element found: line 1, column 0


any idea ?
dav
Post by Mario Castro
Hi all
I've seen this comment in isi web of science (trrough google,of course :-) )
*****************
/Web of Science/ ? External Links
After working out and implementing the various internal links that
provide useful navigation for /Web of Science/ users, the next natural
step was to explore the potential for establishing links to data
external to ISI products. Before proceeding, we conducted extensive
research into the potential of various standard identifiers that might
aid in this venture. Each of the candidates (Digital Object Identifier
[DOI <http://www.doi.org/>], Serial Item and Contribution Identifier
[SICI ? NISO Z39.56 1996], and Publisher Item Identifier [PII]) was
found to be lacking for various reasons that will not be expanded on in
this forum. The decision was made to use internal keys to move ahead
with links. With the understanding that we might come back later and
reconsider this stance, we felt most comfortable initially with our
ability to create external links based on the same identifiers that have
worked so well for our internal links.
******************
So It seems that isi does not support doi as a key index. Bad news.
However, I've been working around with my plugin and it's pretty useful
in some cases. For instance, you can drag and drop a pdf for an old
article (hence without doi) and write manually some words from the
title, the year and the first author. Then clickin in the plugin
button.....you obtain the rest of the information as desired.
--
----------------
David Coeurjolly - Charg? de recherche CNRS
Laboratoire LIRIS-UMR 5205
B?timent Nautibus, Universit? Claude Bernard Lyon 1
43 boulevard du 11 novembre 1918, 69622 Villeurbanne cedex, France
Tel : (+33) [0]4.72.44.82.40 Fax : (+33) [0]4.72.43.15.36

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://icculus.org/pipermail/referencer/attachments/20080318/07970730/attachment.pgp>
Michael Banck
2008-07-08 00:15:36 UTC
Permalink
Post by John Spray
The way the "Get metadata" stuff is structured is such that each plugin
publishes a list of ID formats that it understands (pubmed, doi, etc),
and when tyring to get the metadata for a paper referencer tries to
resolve an ID to the metadata.
I've just imported a couple of papers I had lying around in paper form
into a referencer database. It is quite cumbersome finding the DOI for
a older paper on which they are not printed, so the process mostly
involved looking up the journal homepage, entering volume/page, looking
for the DOI on the results and entering that into referencer.

I noticed that at least for some journals, it is possible to construct a
URL from volume and page of the paper alone, which is basically the same
URL you get for the paper via dx.doi.org, and which also has the DOI
mentioned somewhere. Of course, those pages would also have all the
other bibliographical information you'd need (author, title, year,
etc.), but extracting that from the html would probably too cumbersome.

It would be awesome to have some sort of journal database which could
look up DOIs from the JournalName-Volume-Page triple the user could
input via a pop-up GUI similar to the "Add Reference with ID" query.
This database could then get expanded by users based on their scientific
field and most used journals.


Michael
jcspray
2008-07-08 08:47:07 UTC
Permalink
Post by Michael Banck
It would be awesome to have some sort of journal database which could
look up DOIs from the JournalName-Volume-Page triple the user could
input via a pop-up GUI similar to the "Add Reference with ID" query.
This database could then get expanded by users based on their scientific
field and most used journals.
If you (as in, y'all, the collective you) make the database, I'll
write the code. If you can arrange for it to be populated with about
50 top physics and biology journals, I think it's worth shipping.
You're right that the database needs to be user-expandable, but I
think it also needs a solid basis to start from.

Note that the triple needs to resolve to something machine-parseable,
rather than human-readable HTML. Many journals have a "download
citation" link or so associated with article pages which would be
useful for this. XML preferred to bibtex since it tends to have fewer
idiosynracies.

John
Michael Banck
2008-07-08 13:14:42 UTC
Permalink
Post by Michael Banck
It would be awesome to have some sort of journal database which could
look up DOIs from the JournalName-Volume-Page triple the user could
input via a pop-up GUI similar to the "Add Reference with ID" query.
This database could then get expanded by users based on their scientific
field and most used journals.
If you (as in, y'all, the collective you) make the database, I'll write
the code. If you can arrange for it to be populated with about 50 top
physics and biology journals, I think it's worth shipping. You're right
that the database needs to be user-expandable, but I think it also needs
a solid basis to start from.
OK.
Note that the triple needs to resolve to something machine-parseable,
rather than human-readable HTML. Many journals have a "download
citation" link or so associated with article pages which would be useful
for this. XML preferred to bibtex since it tends to have fewer
idiosynracies.
While this would be certainly welcome, I think initially it would be
less work to just extract the DOI from the article's HTML page, and do a
metadata search on it using the available plugins (crossref/pubmed/
arxiv).

This reduces the problem to constructing the unique URL of the article,
and extracting the DOI from the HTML page, something users without any
python or XML-parsing knowledge could do for their journals.

Constructing a unique URL is not always possible (e.g. Science Direct
seems to use md5sum hashes for each article as URL), but seems to work
for a lot of cases. For example, for the American Institue of Physics
(AIP) journals, the URL is as follows:

http://link.aip.org/link/?$JOURN/$VOLUME/$PAGE_OR_ARTICLE_ID/1

where $JOURN is a 6-char/digit ID of the journal (e.g. JAPIAU for J.
Appl. Phys. or JCPSA6 for J. Chem. Phys.).

So to get the DOI for the article from page 4965 of volume 93 from J.
Chem. Phys., you can do

wget -O - http://link.aip.org/link/?JCPSA6/93/4965/1 2> /dev/null | \
grep -i doi | head -1 | sed s/.*DOI.//

and then lookup the metadata using that DOI.


Michael

PS: Once you have a good URL for a particular article, you can try to
directly download the PDF with a single wget/python command provided you
have access to it through your institution; this works in less cases
than what I decribe above, but I managed to do so for a couple of
journals a while ago (though I cannot find my notes about them right
now). This could be maybe merged into the "File:" field of the Document
Properties window by adding a "Download" option to it somehow, when no
File has been assigned yet.
jcspray
2008-07-08 13:35:44 UTC
Permalink
Post by Michael Banck
Post by jcspray
Note that the triple needs to resolve to something machine-parseable,
rather than human-readable HTML. Many journals have a "download
citation" link or so associated with article pages which would be useful
for this. XML preferred to bibtex since it tends to have fewer
idiosynracies.
While this would be certainly welcome, I think initially it would be
less work to just extract the DOI from the article's HTML page, and do a
metadata search on it using the available plugins (crossref/pubmed/
arxiv).
Yes, but my point about it being machine-parseable stands: regexing
the DOI out of a webpage is not necessarily trivial, especially in
pages including lists of citations and their DOIs. But yes,
downloading the metadata from elsewhere once a DOI is found is
perfectly acceptable.
Post by Michael Banck
This reduces the problem to constructing the unique URL of the article,
and extracting the DOI from the HTML page, something users without any
python or XML-parsing knowledge could do for their journals.
Constructing a unique URL is not always possible (e.g. Science Direct
seems to use md5sum hashes for each article as URL), but seems to work
for a lot of cases. For example, for the American Institue of Physics
http://link.aip.org/link/?$JOURN/$VOLUME/$PAGE_OR_ARTICLE_ID/1
where $JOURN is a 6-char/digit ID of the journal (e.g. JAPIAU for J.
Appl. Phys. or JCPSA6 for J. Chem. Phys.).
Alright, that's 2/50. Here's a sketch of how I see that information:

-> User selects a journal
-> Journal maps to a lookup function, in this case AIP
-> User selects remaining fields required by lookup function, in this
case volume and page.
-> Lookup function is invoked with journal key, volume and page,
translates this to a URI, downloads it, and applies its regex to it to
extract a DOI.

Here's the set of information I think is needed. Anything missing?

<journal>
<name>J. Appl. Phys.</name>
<alias>Journal of Applied Physics</alias>
<key>JAPIAU</key>
<lookup>AIP</lookup>
</journal>

<journal>
<name>J. Chem. Phys.</name>
<alias>Journal of Chemical Physics</alias>
<key>JCPSA6</key>
<lookup>AIP</lookup>
</journal>

<journal_lookup>
<name>AIP</name>
<!-- %0 is always the journal key -->
<uri>http://link.aip.org/link/?%0/%1/%2/1</uri>
<fields>
<field name="Volume" id="2"/>
<field name="Page" id=3"/>
</fields>
<regex>(DOI.*)$</regex>
</journal_lookup>
Michael Banck
2008-07-08 14:41:50 UTC
Permalink
Post by Michael Banck
While this would be certainly welcome, I think initially it would be
less work to just extract the DOI from the article's HTML page, and do a
metadata search on it using the available plugins (crossref/pubmed/
arxiv).
This reduces the problem to constructing the unique URL of the article,
and extracting the DOI from the HTML page, something users without any
python or XML-parsing knowledge could do for their journals.
Constructing a unique URL is not always possible (e.g. Science Direct
seems to use md5sum hashes for each article as URL), but seems to work
for a lot of cases. For example, for the American Institue of Physics
I did some research now, and unfortunately it seems that most other
organizations encode the issue number into the URL as well. While the
year, volume and page are readily available from usual citations in
other articles or by looking at the paper article itself, the issue
number is usually ommitted. By the time you found out the issue number,
you can just as well research the DOI itself and type that I guess :-/

I checked the following, and they all seem to need the issue number:

Science
http://www.sciencemag.org/cgi/content/abstract/321/5885/97

American Chemical Society
http://pubs.acs.org/cgi-bin/abstract.cgi/jacsat/2008/130/i27/abs/ja8018912.html

American Society for Biochemistry and Molecular Biology
http://www.jbc.org/cgi/content/abstract/283/28/19351
http://www.mcponline.org/cgi/content/abstract/7/7/1397

Proceedings of the National Acedemy of Science
http://hwmaint.pnas.org/cgi/content/abstract/105/26/9011

Nature and Wiley both use the DOI and not the page number in their URLs.

This basically leaves AIP and APS journals, the latter has alternative
short URls without issue numbers:
http://link.aps.org/abstract/PRL/v30/p368

Maybe it looks better for other fields like Math, Biology or
Biochemistry (I mostly checked Physical Chemistry and related)

Also note that the Web-of-Sience plugin is probably able to resolve the
DOI/full bibliographic information if the user has access to WOS and is
fed with Journal/Volume/Page appropriately. Right now it is of limited
use as you already need to have the data available.

Another possible GUI for this would be to check whether a reference's
properties has at least the journal name, the volume and the page filled
in, and then allow the user to hit "Lookup Metadata", at which point the
appropriate plugins (let's say AIP/APS, and WOS if available (maybe
Pubmed can do more advanced searches than just DOI, have not checked))
would get passed the information, and retrieve the DOI and possibly
other information.

In any case, once you have the DOI/issue number, fulltext acces is
possbile for some of the above (PNAS, Nature, APS/AIP, ASBMB).


Michael

Loading...