python script, batch retrieve bibtex

Requirement: When writing a paper using LaTeX, do you have the need to convert citations to BibTeX format? If you have a large number of references, this repetitive work is not worth doing. If you have used reference management tools such as EndNote or Zotero, you can export them with just one click. However, if you don't have them, this article provides a solution.

Solution: crossref API + Google Scholar API

Crossref is the largest foreign DOI publishing platform, which basically includes the metadata of all foreign literature. However, there are also some literature, including but not limited to arXiv, that cannot be queried. At this time, you need the help of Google Scholar.

In order to save everyone's time, I have already encapsulated these two APIs, you just need to download them using pip.

pip install get_bibtex

After that, you can use it according to the following instructions.

from apiModels.get_bibtex_from_crossref import GetBibTex
from apiModels.get_bibtex_from_google_scholar import GetBibTexFromGoogleScholar

if __name__ == '__main__':
    google_scholar_api_key = "your_google_scholar_api_key"
    get_bibtex_from_crossref = GetBibTex("[email protected]")
    get_bibtex_from_google_scholar = GetBibTexFromGoogleScholar(google_scholar_api_key, GetBibTexFromGoogleScholar.APA)

    with open("inputfile/Bibliographyraw.txt", "r", encoding='utf-8') as f:
        raws = f.readlines()
    
    # get bibtex from CrossRef and failed search results
    success_bibtexs_crossref, failed_results = get_bibtex_from_crossref.get_bibtexs(raws)
    
    # for each failed search result, get bibtex from Google Scholar
    success_bibtexs_google, failed_results = get_bibtex_from_google_scholar.get_bibtexs(failed_results)

    with open("outputfile/BibliographyCrossRef.txt", "w", encoding='utf-8') as f:
        for bibtex in success_bibtexs_crossref:
            f.write(bibtex)

    with open("outputfile/BibliographyGoogleScholar.txt", "w", encoding='utf-8') as f:
        for index, bibtex in enumerate(success_bibtexs_google):
            f.write("[]".format(index) + " " + bibtex + "\n")

    with open("outputfile/not_find.txt", "w", encoding='utf-8') as f:
        for result in failed_results:
            f.write(result+"\n")

    print("find bibtex from CrossRef: ", len(success_bibtexs_crossref))
    print("find bibtex from Google Scholar: ", len(success_bibtexs_google))
    print("not find: ", len(failed_results))

Explanation of key code

The Bibliographyraw.txt file contains the files to be queried
For example:
J. Hu, L. Shen, S. Albanie, G. Sun, and A. Vedaldi, “Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks.” arXiv, Jan. 12, 2019. doi: 10.48550/arXiv.1810.12348.
X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local Neural Networks.” arXiv, Apr. 13, 2018. doi: 10.48550/arXiv.1711.07971.
------------------

success_bibtexs_crossref, failed_results = get_bibtex_from_crossref.get_bibtexs(raws)
The first parameter returned is a list of BibTeX, and the second is the original literature that was not found.

success_bibtexs_google, failed_results = get_bibtex_from_google_scholar.get_bibtexs(failed_results)
Continue to use the Google Scholar API to query the literature that was not found. Generally, all literature can be found on Google Scholar.

Note: The returned format is actually APA format, which is specified during initialization. There is also a parameter that can be set to return BibTeX format, for example:
get_bibtex_from_google_scholar = GetBibTexFromGoogleScholar(google_scholar_api_key, GetBibTexFromGoogleScholar.APA, flag = True)
But you need to set up a proxy server, for example:
import os
import re
import requests
os.environ["http_proxy"]="127.0.0.1:7890"
os.environ["https_proxy"]="127.0.0.1:7890"

！！！！！！！Note: You need to apply for an API first, with 100 free queries per month, which is generally enough. Apply at serpapi.com

The rest of the code is self-explanatory haha

Of course, there is also a single query request:

get_bibtex() remove the 's' and it will work

Updated on April 16, 2024#

Added DBLP interface

from apiModels.get_bibtex_from_dblp import GetBibTexFromDBLP

Improved usability

Now a pre-packaged class is provided for use, which has already encapsulated the APIs of Crossref and DBLP.
from apiModels.workflow.crossref2dblp import Crossref2Dblp

Usage (without Google Scholar API):
crossref2dblp = Crossref2Dblp("your email", "inputfile/Bibliographyraw.txt", "outputfile/Bibliography.txt")
crossref2dblp.running()
Wait for the completion of the operation.

(With Google Scholar API):
from apiModels.workflow.crossref2dblp import Crossref2Dblp
from apiModels.get_bibtex_from_google_scholar import GetBibTexFromGoogleScholar
get_bibtex_from_google_scholar = GetBibTexFromGoogleScholar(api_key="your api key")
Add your encapsulated API as the last parameter
crossref2dblp = Crossref2Dblp("[email protected]", "inputfile/Bibliographyraw.txt", "outputfile/Bibliography.txt",get_bibtex_from_google_scholar)
crossref2dblp.running()
Wait for the completion of the operation.

Or if you want to define the order of API calls yourself:
from apiModels.workflow.make_workflow import MakeWorkflow
from apiModels.get_bibtex_from_google_scholar import GetBibTexFromGoogleScholar
from apiModels.get_bibtex_from_crossref import GetBibTex

get_bibtex_from_google_scholar = GetBibTexFromGoogleScholar(api_key="your api key")
get_bibtex_from_crossref = GetBibTex("[email protected]")
make_workflow = MakeWorkflow("inputfile/Bibliographyraw.txt", "outputfile/Bibliography.txt", get_bibtex_from_google_scholar, get_bibtex_from_crossref)
make_workflow.running()

Before using:
pip install get_bibtex = 1.1.0

Welcome to make improvements.

This article is synchronized and updated to xLog by Mix Space.
The original link is https://me.liuyaowen.club/posts/default/20240816and2