Discussion:
referencer 1.1-pre
John Spray
2008-01-13 22:52:32 UTC
Permalink
All,

I would appreciate if any interested parties could test this tarball,
reporting any build problems, crashes or regressions.

Cheers,
John
-------------- next part --------------
A non-text attachment was scrubbed...
Name: referencer-1.1-pre0.tar.gz
Type: application/x-compressed-tar
Size: 472273 bytes
Desc: not available
URL: <http://icculus.org/pipermail/referencer/attachments/20080113/cdc48704/attachment.bin>
Aurélien Naldi
2008-01-14 09:49:28 UTC
Permalink
Post by John Spray
All,
I would appreciate if any interested parties could test this tarball,
reporting any build problems, crashes or regressions.
It "Works for me"(tm).

The dialog when adding documents is really nice, even if I was a bit
disapointed to be unable to tag the "metadata-lacking" documents from
here. It would be really nice to have some way to track metadata-lacking
documents, but it should definitely not prevent you from releasing what
you have right now!
It also allowed me to realize that it was unable to extract the text
from my postcript papers (not sure if it is new or not).

I also add a problem with the doi detection for this paper:
http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2Fjournal.pcbi.0030109
It found something but added some junk at the end, which I don't see in
the output of pdftotext.

Thanks also for the exception-catched dialog for plugins! The
explanations coming from the lyx plugin are a bit rough but at least
they appear.

One other glitch, now that I have fully tested the pdf production with
referencer-inserted citation in lyx: the accentuated characters are not
protected in the bibtex keys (they are in the other fields). Does bibtex
support special characters in keys at all ? I'm not a lyx/latex guru
(and I would LOVE to avoid becoming one) so I might be just doing
something wrong here...

Anyway, this will make a rocking release, and let's hope it will attract
more eyes on referencer ;)

Best regards.
--
Aurelien Naldi
jcspray
2008-01-14 10:34:22 UTC
Permalink
Post by Aurélien Naldi
It also allowed me to realize that it was unable to extract the text
from my postcript papers
It never did that.
Post by Aurélien Naldi
http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2$
It found something but added some junk at the end, which I don't see in
the output of pdftotext.
There's a newline in the middle of the DOI in the paper. The DOI regex is
picking up another DOI later on which has the 'junk' on it. Regexing out
DOIs is always going to be a bit hit and miss.
Post by Aurélien Naldi
One other glitch, now that I have fully tested the pdf production with
referencer-inserted citation in lyx: the accentuated characters are not
protected in the bibtex keys (they are in the other fields). Does bibtex
support special characters in keys at all ? I'm not a lyx/latex guru
(and I would LOVE to avoid becoming one) so I might be just doing
something wrong here...
The principle of least surprise is in action here. Most people I know
would write Gruber06 rather than Gr\"uber06. However, there is no general
way to map accented-latin characters into english characters, so
referencer leaves them alone. Converting non-ascii characters into their
latex equivalent wouldn't be appropriate for key names, since they're
never typeset. I know there are things like ss for ? and ae for ?, oe
for ?, but my knowledge is pretty special-case for that. I wonder if
there's an ISO standard?

John
Aurélien Naldi
2008-01-14 10:55:13 UTC
Permalink
Post by jcspray
Post by Aurélien Naldi
It also allowed me to realize that it was unable to extract the text
from my postcript papers
It never did that.
As the papers don't contain DOI, I was unsure about it before...
Post by jcspray
Post by Aurélien Naldi
http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2$
It found something but added some junk at the end, which I don't see in
the output of pdftotext.
There's a newline in the middle of the DOI in the paper. The DOI regex is
picking up another DOI later on which has the 'junk' on it. Regexing out
DOIs is always going to be a bit hit and miss.
Yes, this is (and will remain) a tricky thing, but the first DOI appears
nicely in the output of pdftotext. This is not a regression anyway. How
does referencer extract the text ? Is it using some other external tool
or doing the job by itself ?
About the doi detection, one thing freaks me out (even if I have not
seen it happen yet): a pdf could contain the doi of some other document
as a way to quote it. Did referencer already pick the wrong doi in such
case for someone ?
<dream>
Let's pray for DOI in the pdf metadata with a clean, consistent and
secure way to read it...
</dream>
Post by jcspray
Post by Aurélien Naldi
One other glitch, now that I have fully tested the pdf production with
referencer-inserted citation in lyx: the accentuated characters are not
protected in the bibtex keys (they are in the other fields). Does bibtex
support special characters in keys at all ? I'm not a lyx/latex guru
(and I would LOVE to avoid becoming one) so I might be just doing
something wrong here...
The principle of least surprise is in action here. Most people I know
would write Gruber06 rather than Gr\"uber06. However, there is no general
way to map accented-latin characters into english characters, so
referencer leaves them alone. Converting non-ascii characters into their
latex equivalent wouldn't be appropriate for key names, since they're
never typeset. I know there are things like ss for ? and ae for ?, oe
for ?, but my knowledge is pretty special-case for that. I wonder if
there's an ISO standard?
I am fine with avoiding special characters in keys, I just wanted to
test if it did actually work, but I don't know much on the subject...
Maybe referencer could map "some" non-ascii characters. Given the huge
amount of software doing accent-proof search and the like, and how well
it works for me, an incomplete-yet-usefull mapping list must exist
somewhere (I really don't care right now though)
--
Aurelien Naldi
jcspray
2008-01-14 11:09:16 UTC
Permalink
Post by Aurélien Naldi
Post by jcspray
There's a newline in the middle of the DOI in the paper. The DOI regex is
picking up another DOI later on which has the 'junk' on it. Regexing out
DOIs is always going to be a bit hit and miss.
Yes, this is (and will remain) a tricky thing, but the first DOI appears
nicely in the output of pdftotext. This is not a regression anyway. How
does referencer extract the text ? Is it using some other external tool
or doing the job by itself ?
libpoppler
Post by Aurélien Naldi
About the doi detection, one thing freaks me out (even if I have not
seen it happen yet): a pdf could contain the doi of some other document
as a way to quote it. Did referencer already pick the wrong doi in such
case for someone ?
Of course.

John
Aurélien Naldi
2008-01-14 14:15:26 UTC
Permalink
Post by jcspray
Post by Aurélien Naldi
Post by jcspray
There's a newline in the middle of the DOI in the paper. The DOI regex is
picking up another DOI later on which has the 'junk' on it. Regexing out
DOIs is always going to be a bit hit and miss.
Yes, this is (and will remain) a tricky thing, but the first DOI appears
nicely in the output of pdftotext. This is not a regression anyway. How
does referencer extract the text ? Is it using some other external tool
or doing the job by itself ?
libpoppler
Post by Aurélien Naldi
About the doi detection, one thing freaks me out (even if I have not
seen it happen yet): a pdf could contain the doi of some other document
as a way to quote it. Did referencer already pick the wrong doi in such
case for someone ?
Of course.
How annoying...
I have one more problem with doi guessing: I have here some (quite a lot
of them) papers with some "metadata" in a small column on the left of
the first page, including:

"This article's doi:
<the doi is here>"

Referencer does not catch it as it only expects spaces between "doi:"
and the doi itself. It does work if I modify it to allow newlines as
well (replacing ":? *" with ":?[ \n]*" as a quick test. Can you include
this or is it some other problem I did not think about ?
For now I'm happy, the doi detection works for 95% of the papers I have
here (except the 50+ that does not contain a doi :/ )

PS: current svn gives me a crash at startup, which was not here a few
hours ago.
--
Aurelien Naldi
Michael Banck
2008-01-14 22:02:00 UTC
Permalink
Post by John Spray
I would appreciate if any interested parties could test this tarball,
reporting any build problems, crashes or regressions.
It builds fine on Debian GNU/Linux.


Michael
Andreas Wagner
2008-01-15 14:09:10 UTC
Permalink
Hello John, hello list,
Post by John Spray
All,
I would appreciate if any interested parties could test this tarball,
reporting any build problems, crashes or regressions.
builds and lauches fine here.

But it crashes (after a busy second or so) when I try to import my BibTex file (attached).

(I have one more nit, it's not even worth filing a bug: in the german translation, the menu item should read "BibTex
Datei verwalten" (with one 'l' instead of two) (it is probably something like "manage BibTex file" in english).)

HTH,
Andreas
--
Press any key to continue or any other key to quit...
--
My Public PGP Keys:
1024 Bit DH/DSS: 0x869F81BA
768 Bit RSA: 0x1AD97BA5
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Diss.bib.bz2
Type: application/x-bzip2
Size: 15437 bytes
Desc: not available
URL: <http://icculus.org/pipermail/referencer/attachments/20080115/7934a51a/attachment.bz2>
John Spray
2008-01-15 22:33:57 UTC
Permalink
On Tue, 2008-01-15 at 16:38 +0100, Andreas.Wagner at em.uni-frankfurt.de
I have had a somewhat closer look at the file and it seemed to start with a couple of very weird characters (to be
seen in hex mode). I don't know if that was something like the unicode BOF/BOM mark. Anyway, I have deleted it and
then converted the file with iconv -f UTF8 -t LATIN1 to get a latin-1 encoded BibTex file (albeit still with non-ascii
characters). I've tried both the -c and the -cs parameters to that command but even on the most sanitized version of
the file the new referencer would crash.
Andreas,

Thanks for sending the reproducer, but I still can't get the crash here.
What's your LANG environment variable?

John
A.Wagner
2008-01-17 10:51:53 UTC
Permalink
Hello John, hello list,
Post by John Spray
I have had a somewhat closer look at the file and it seemed to start with a couple of very weird characters (to be
seen in hex mode). I don't know if that was something like the unicode BOF/BOM mark. Anyway, I have deleted it and
then converted the file with iconv -f UTF8 -t LATIN1 to get a latin-1 encoded BibTex file (albeit still with
non-ascii characters). I've tried both the -c and the -cs parameters to that command but even on the most sanitized
version of the file the new referencer would crash.
Andreas,
Thanks for sending the reproducer, but I still can't get the crash here.
What's your LANG environment variable?
[wagner at apollo ~]$ echo $LANG
de_DE.utf8

(LC_COLLATE=C if that is of any relevance)

Andreas
--
My Public PGP Keys:
1024 Bit DH/DSS: 0x869F81BA
768 Bit RSA: 0x1AD97BA5
A.Wagner
2008-03-02 11:20:08 UTC
Permalink
Hello John, hello list,
Post by A.Wagner
Post by John Spray
I have had a somewhat closer look at the file and it seemed to start with a couple of very weird
characters (to be seen in hex mode). I don't know if that was something like the unicode BOF/BOM mark.
Anyway, I have deleted it and then converted the file with iconv -f UTF8 -t LATIN1 to get a latin-1
encoded BibTex file (albeit still with non-ascii characters). I've tried both the -c and the -cs
parameters to that command but even on the most sanitized version of the file the new referencer would
crash.
Thanks for sending the reproducer, but I still can't get the crash here.
What's your LANG environment variable?
[wagner at apollo ~]$ echo $LANG
de_DE.utf8
(LC_COLLATE=C if that is of any relevance)
just wanted to inform you that the crash still happens with 1.1.1, but now I get an error message:
(referencer:10214): Gtk-CRITICAL **: gtk_list_store_set_sort_column_id: assertion `list_store->default_sort_func != NULL' failed
Speicherzugriffsfehler

Maybe it is of some help,
Cheers,
Andreas
--
Linux: Because rebooting is for adding new hardware.
--
My Public PGP Keys:
1024 Bit DH/DSS: 0x869F81BA
768 Bit RSA: 0x1AD97BA5
John Spray
2008-03-02 11:51:58 UTC
Permalink
Post by A.Wagner
(referencer:10214): Gtk-CRITICAL **: gtk_list_store_set_sort_column_id: assertion `list_store->default_sort_func != NULL' failed
Speicherzugriffsfehler
Unfortunately I still can't reproduce the problem, even in the
de_DE.UTF-8 locale.

John

jcspray
2008-01-15 15:17:09 UTC
Permalink
Ach, my webmail doesn't respect reply-to...

----- Forwarded message from jcspray at icculus.org -----
Date: Tue, 15 Jan 2008 09:55:23 -0500
From: jcspray at icculus.org
Reply-To: jcspray at icculus.org
Subject: Re: [referencer] referencer 1.1-pre
To: Andreas Wagner <A.Wagner at stud.uni-frankfurt.de>
Post by Andreas Wagner
Hello John, hello list,
Post by John Spray
All,
I would appreciate if any interested parties could test this
tarball, reporting any build problems, crashes or regressions.
builds and lauches fine here.
But it crashes (after a busy second or so) when I try to import my
BibTex file (attached).
Your bibtex file appears to be UTF-8 encoded. For me it doesn't
crash, but non-ascii characters get mangled. The bibtex import code
does currently assume that bibtex files are latin-1 encoded.
Post by Andreas Wagner
(I have one more nit, it's not even worth filing a bug: in the german
translation, the menu item should read "BibTex Datei verwalten" (with
one 'l' instead of two) (it is probably something like "manage BibTex
file" in english).)
You should be able to fix this on http://www.launchpad.net/referencer/
if it bugs you.

John


----- End forwarded message -----
Rodrigo Kassick
2008-01-15 17:07:13 UTC
Permalink
Compiles and runs in Ubuntu Gutsy; no crash yet.

One question: with the new icon view, aren't tooltips deprecated ? Which
information they show that isn't already in the view ?

Kassick.
Post by John Spray
All,
I would appreciate if any interested parties could test this tarball,
reporting any build problems, crashes or regressions.
Cheers,
John
---
To unsubscribe, send a blank email to referencer-unsubscribe at icculus.org
Mailing list archives: http://icculus.org/cgi-bin/ezmlm/ezmlm-cgi?60
--
Rodrigo Virote Kassick
(k?zic)
------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://icculus.org/pipermail/referencer/attachments/20080115/92cef496/attachment.htm>
John Spray
2008-01-15 20:35:01 UTC
Permalink
Post by Rodrigo Kassick
Compiles and runs in Ubuntu Gutsy; no crash yet.
One question: with the new icon view, aren't tooltips deprecated ?
Which information they show that isn't already in the view ?
The could be used to display more info in the future. It was less
effort to just not touch them.

John
Continue reading on narkive:
Loading...