Discussion:
problems when importing bibtex entries
Brice Goglin
2007-03-05 13:04:27 UTC
Permalink
Hi,

I am trying Referencer since it looks very promising. While importing my
existing bibtex files, I always get an error "Invalid byte sequence in
conversion input - This problem was encountered while parsing import".
For instance, when the file contains:

@TECHREPORT{LocalUserSched,
AUTHOR = { Martin Steckermeier and Frank Bellosa },
TITLE = { Using Locality Information in Userlevel Scheduling },
INSTITUTION = { University of Erlangen-N?rnberg -- Computer Science
Department -- Operating Systems -- {IMMD IV} },
YEAR = 1995,
NUMBER = { TR-95-14 },
NOTE = { \url{mstecker at informatik.uni-erlangen.de}
\url{bellosa at informatik.uni-erlangen.de} },
ADDRESS = { Martensstra?e 1, 91058 Erlangen, Germany },
MONTH = DEC,
DAY = 23,
}

I get this error with the following output in the terminal
Publisher =

I tried adding a dummy publisher field, but it does not help. And it
seems to me that all the required fields for a "techreport" entry are
set in my file (I looked in Emacs' bibtex mode to know which ones are
required).

Any idea?

Thanks,
Brice
John Spray
2007-03-05 13:38:10 UTC
Permalink
Post by Brice Goglin
I am trying Referencer since it looks very promising. While importing my
existing bibtex files, I always get an error "Invalid byte sequence in
conversion input - This problem was encountered while parsing import".
@TECHREPORT{LocalUserSched,
AUTHOR = { Martin Steckermeier and Frank Bellosa },
TITLE = { Using Locality Information in Userlevel Scheduling },
INSTITUTION = { University of Erlangen-N?rnberg -- Computer Science
Department -- Operating Systems -- {IMMD IV} },
YEAR = 1995,
NUMBER = { TR-95-14 },
NOTE = { \url{mstecker at informatik.uni-erlangen.de}
\url{bellosa at informatik.uni-erlangen.de} },
ADDRESS = { Martensstra?e 1, 91058 Erlangen, Germany },
MONTH = DEC,
DAY = 23,
}
Hmm, when I import this snippet from a UTF-8 encoded file, it works but
the ? and so on get manged. When I save it as an iso-8859-1 file it
just works, output on the console like:
"""
Publisher = University of Erlangen-N?rnberg ? Computer Science
Department ? Operating Systems ? IMMD IV (0)
Note = \urlmstecker at informatik.uni-erlangen.de
\urlbellosa at informatik.uni-erlangen.de (0)
Address = Martensstra?e 1, 91058 Erlangen, Germany (0)
Month = DEC(0)
Day = 23(0)
"""

What is your LANG environment variable set to? What encoding is the
bibtex file in?

Regards,
John
Brice Goglin
2007-03-05 14:58:12 UTC
Permalink
Post by John Spray
Hmm, when I import this snippet from a UTF-8 encoded file, it works but
the ? and so on get manged.
I should have chosen another example, these German character could
confuse the discussion :)

I don't have any problem with these special characters here. After
searching a little bit, I found out that the failure is caused by the
double-dash in the institution:
INSTITUTION = { University of Erlangen-N?rnberg -- Computer Science
Department - Operating Systems - {IMMD IV} },
=> fails
INSTITUTION = { University of Erlangen-N?rnberg - Computer Science
Department - Operating Systems - {IMMD IV} },
=> works

Apart from this problem, I finally manage to locate where my other
failing entries had a problem. It seems that Referencer does not like
having a quote in the publisher or booktitle field. I had several
entries with "O'Reilly" or "Developer's" in the publisher or booktitle
field, this get accepted once I remove the quote. Having a quote in the
title or author does not seem to cause any problem. Here are the failing
entries, in case you want to look at it:

@misc{ braam99intermezzo,
author = "Peter Braam Braam",
title = "{T}he {I}nter{M}ezzo {F}ile {S}ystem",
booktitle = "Proceedings of the O'Reilly Perl Conference 3",
year = "1999",
url = "http://citeseer.nj.nec.com/braam99intermezzo.html"
}

@Book{ bovet03understanding,
author = {Daniel P. Bovet and Marco Cesati},
title = "{U}nderstanding the {L}inux {K}ernel, {S}econd {E}dition",
publisher = "O'Reilly",
year = 2003,
isbn = "0-596-00213-0",
}

@Book{ love04linux,
author = {Robert Love},
title = "{L}inux {K}ernel {D}evelopment",
publisher = "Developer's Library, Sams Publishing",
year = 2004,
isbn = "0-672-32512-8",
}

All of them have been used in earlier publications without ever getting
a problem with bibtool or bibtex from what I remember. I don't how
whether Referencer uses its own parser or something common to
bibtex/bibtool anyway...
Post by John Spray
What is your LANG environment variable set to? What encoding is the
bibtex file in?
I don't know if it matters anymore, but in case it does:

I don't have any LANG set. I just have the following config:
LANG=
LC_CTYPE=fr_FR at euro
LC_NUMERIC="POSIX"
LC_TIME=fr_FR at euro
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Setting LANG to en_US before starting referencer does not seem to help
anyway.

By the way, it would be great to display the parsing error in the error
window instead of in the terminal :)

thanks a lot,
Brice
Brice Goglin
2007-03-05 15:08:26 UTC
Permalink
Post by Brice Goglin
Post by John Spray
Hmm, when I import this snippet from a UTF-8 encoded file, it works but
the ? and so on get manged.
I should have chosen another example, these German character could
confuse the discussion :)
I don't have any problem with these special characters here. After
searching a little bit, I found out that the failure is caused by the
INSTITUTION = { University of Erlangen-N?rnberg -- Computer Science
Department - Operating Systems - {IMMD IV} },
=> fails
INSTITUTION = { University of Erlangen-N?rnberg - Computer Science
Department - Operating Systems - {IMMD IV} },
=> works
Apart from this problem, I finally manage to locate where my other
failing entries had a problem. It seems that Referencer does not like
having a quote in the publisher or booktitle field. I had several
entries with "O'Reilly" or "Developer's" in the publisher or booktitle
field, this get accepted once I remove the quote. Having a quote in the
title or author does not seem to cause any problem. Here are the failing
@misc{ braam99intermezzo,
author = "Peter Braam Braam",
title = "{T}he {I}nter{M}ezzo {F}ile {S}ystem",
booktitle = "Proceedings of the O'Reilly Perl Conference 3",
year = "1999",
url = "http://citeseer.nj.nec.com/braam99intermezzo.html"
}
@Book{ bovet03understanding,
author = {Daniel P. Bovet and Marco Cesati},
title = "{U}nderstanding the {L}inux {K}ernel, {S}econd {E}dition",
publisher = "O'Reilly",
year = 2003,
isbn = "0-596-00213-0",
}
@Book{ love04linux,
author = {Robert Love},
title = "{L}inux {K}ernel {D}evelopment",
publisher = "Developer's Library, Sams Publishing",
year = 2004,
isbn = "0-672-32512-8",
}
To bring more confusion, there are some entries where the quote is
accepted in booktitle. For instance:

@inproceedings{ schmuck02gpfs,
author = "Frank Schmuck and Roger Haskin",
title = "{GPFS}: {A} {S}hared-{D}isk {F}ile {S}ystem for {L}arge
{C}omputing {C}lusters",
booktitle = "Proceedings of the Conference on File and Storage
Technologies (FAST'02)",
publisher = "USENIX, Berkeley, CA",
pages = "231--244",
year = "2002",
month = JAN,
address = "Monterey, CA",
}

Brice
Brice Goglin
2007-03-06 20:38:31 UTC
Permalink
Post by Brice Goglin
To bring more confusion, there are some entries where the quote is
@inproceedings{ schmuck02gpfs,
author = "Frank Schmuck and Roger Haskin",
title = "{GPFS}: {A} {S}hared-{D}isk {F}ile {S}ystem for {L}arge
{C}omputing {C}lusters",
booktitle = "Proceedings of the Conference on File and Storage
Technologies (FAST'02)",
publisher = "USENIX, Berkeley, CA",
pages = "231--244",
year = "2002",
month = JAN,
address = "Monterey, CA",
}
And last thing for now:

When exporting the database as a bibtex file, most quotes are refused,
including the one above, giving the following error:

escapeBibtexAccents '
(Referencer:6870): glibmm-CRITICAL **:
unhandled exception (type Glib::Error) in signal handler:
domain: g_convert_error
code : 1
what : Invalid byte sequence in conversion input

By the way, I also got an error when exporting an entry whose title
field contained some latex code ($\null^2$). This entry works fine in
bibtex.

Brice
John Spray
2007-03-06 23:53:12 UTC
Permalink
Post by Brice Goglin
When exporting the database as a bibtex file, most quotes are refused,
escapeBibtexAccents '
domain: g_convert_error
code : 1
what : Invalid byte sequence in conversion input
By the way, I also got an error when exporting an entry whose title
field contained some latex code ($\null^2$). This entry works fine in
bibtex.
Thanks for the detailed report -- I'll start working through these items
in due course.

John

Loading...