Post by jcsprayPost by Aurélien NaldiIt also allowed me to realize that it was unable to extract the text
from my postcript papers
It never did that.
As the papers don't contain DOI, I was unsure about it before...
Post by jcsprayPost by Aurélien Naldihttp://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2$
It found something but added some junk at the end, which I don't see in
the output of pdftotext.
There's a newline in the middle of the DOI in the paper. The DOI regex is
picking up another DOI later on which has the 'junk' on it. Regexing out
DOIs is always going to be a bit hit and miss.
Yes, this is (and will remain) a tricky thing, but the first DOI appears
nicely in the output of pdftotext. This is not a regression anyway. How
does referencer extract the text ? Is it using some other external tool
or doing the job by itself ?
About the doi detection, one thing freaks me out (even if I have not
seen it happen yet): a pdf could contain the doi of some other document
as a way to quote it. Did referencer already pick the wrong doi in such
case for someone ?
<dream>
Let's pray for DOI in the pdf metadata with a clean, consistent and
secure way to read it...
</dream>
Post by jcsprayPost by Aurélien NaldiOne other glitch, now that I have fully tested the pdf production with
referencer-inserted citation in lyx: the accentuated characters are not
protected in the bibtex keys (they are in the other fields). Does bibtex
support special characters in keys at all ? I'm not a lyx/latex guru
(and I would LOVE to avoid becoming one) so I might be just doing
something wrong here...
The principle of least surprise is in action here. Most people I know
would write Gruber06 rather than Gr\"uber06. However, there is no general
way to map accented-latin characters into english characters, so
referencer leaves them alone. Converting non-ascii characters into their
latex equivalent wouldn't be appropriate for key names, since they're
never typeset. I know there are things like ss for ? and ae for ?, oe
for ?, but my knowledge is pretty special-case for that. I wonder if
there's an ISO standard?
I am fine with avoiding special characters in keys, I just wanted to
test if it did actually work, but I don't know much on the subject...
Maybe referencer could map "some" non-ascii characters. Given the huge
amount of software doing accent-proof search and the like, and how well
it works for me, an incomplete-yet-usefull mapping list must exist
somewhere (I really don't care right now though)
--
Aurelien Naldi