get-bibtex: A Python tool that makes managing literature citations easier

Introduction#

In academic research, managing references is an important but time-consuming task. Especially when writing papers, we often need to obtain citation formats from different databases. To solve this problem, I developed the get-bibtex Python library, which helps researchers quickly obtain BibTeX format citations from multiple academic databases.

Why Choose get-bibtex?#

1. Multi-source Support#

CrossRef (the most comprehensive DOI database)
DBLP (computer science literature database)
Google Scholar (requires API key)

2. Intelligent Workflow#

from get_bibtex import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

# Create workflow
workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX("[email protected]"))
workflow.add_fetcher(DBLPBibTeX())

# Batch processing
papers = [
    "10.1145/3292500.3330919",  # Using DOI
    "Attention is all you need"   # Using title
]
results = workflow.get_multiple_bibtex(papers)

3. Simple and Easy to Use#

from get_bibtex import CrossRefBibTeX

# Single citation retrieval
fetcher = CrossRefBibTeX(email="[email protected]")
bibtex = fetcher.get_bibtex("10.1145/3292500.3330919")
print(bibtex)

4. File Batch Processing#

# Read from file and save
workflow.process_file(
    input_path="papers.txt",
    output_path="references.bib"
)

Featured Functions#

Intelligent Fallback Mechanism
- Automatically tries other data sources when one fails
- Ensures maximum retrieval of citation information
Progress Tracking
- Displays processing progress using tqdm
- Clearly understand the status of batch processing
Error Handling
- Detailed logging
- Gracefully handles API limits and network errors
Formatted Output
- Automatically cleans and formats BibTeX
- Ensures consistency of output format

Use Cases#

Paper Writing#

When you are writing a paper, you can directly obtain citations using DOI or title:

from get_bibtex import CrossRefBibTeX

fetcher = CrossRefBibTeX()
citations = [
    "Machine learning",
    "Deep learning",
    "10.1038/nature14539"
]

for citation in citations:
    bibtex = fetcher.get_bibtex(citation)
    print(bibtex)

Literature Review#

Batch process a large number of literature citations:

from get_bibtex import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX())
workflow.add_fetcher(DBLPBibTeX())

# Read literature list from file
workflow.process_file("papers.txt", "bibliography.bib")

Suppose we need to obtain citations for the following papers:

FedMSA: A Model Selection and Adaptation System for Federated Learning
Attention Is All You Need: A Pioneering Work on Transformers
Non-Local Neural Networks
ECA-Net: Efficient Channel Attention Mechanism
CBAM: Convolutional Block Attention Module

Using CrossRef to Retrieve (via DOI)#

from apiModels import CrossRefBibTeX

fetcher = CrossRefBibTeX(email="[email protected]")

# FedMSA
bibtex = fetcher.get_bibtex("10.3390/s22197244")
print(bibtex)

# ECA-Net
bibtex = fetcher.get_bibtex("10.1109/cvpr42600.2020.01155")
print(bibtex)

Example Output:

@article{Sun_2022,
  title={FedMSA: A Model Selection and Adaptation System for Federated Learning},
  volume={22},
  ISSN={1424-8220},
  url={http://dx.doi.org/10.3390/s22197244},
  DOI={10.3390/s22197244},
  number={19},
  journal={Sensors},
  publisher={MDPI AG},
  author={Sun, Rui and Li, Yinhao and Shah, Tejal and Sham, Ringo W. H. and Szydlo, Tomasz and Qian, Bin and Thakker, Dhaval and Ranjan, Rajiv},
  year={2022},
  month=sep,
  pages={7244}
}

Using DBLP to Retrieve (via Title)#

from apiModels import DBLPBibTeX

fetcher = DBLPBibTeX()

# CBAM
bibtex = fetcher.get_bibtex("CBAM: Convolutional Block Attention Module")
print(bibtex)

Example Output:

@article{DBLP:journals/access/WangZHLL24,
  author       = {Niannian Wang and Zexi Zhang and Haobang Hu and Bin Li and Jianwei Lei},
  title        = {Underground Defects Detection Based on {GPR} by Fusing Simple Linear Iterative Clustering Phash (SLIC-Phash) and Convolutional Block Attention Module (CBAM)-YOLOv8},
  journal      = {{IEEE} Access},
  volume       = {12},
  pages        = {25888--25905},
  year         = {2024},
  url          = {https://doi.org/10.1109/ACCESS.2024.3365959},
  doi          = {10.1109/ACCESS.2024.3365959}
}

Using Workflow to Retrieve Multiple Citations#

from apiModels import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

# Create workflow
workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX(email="[email protected]"))
workflow.add_fetcher(DBLPBibTeX())

# Prepare query list
queries = [
    "FedMSA: A Model Selection and Adaptation System for Federated Learning",
    "Attention Is All You Need",
    "Non-Local Neural Networks",
    "ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks",
    "CBAM: Convolutional Block Attention Module"
]

# Retrieve all citations
results = workflow.get_multiple_bibtex(queries)

# Print results
for query, bibtex in results.items():
    print(f"\nQuery: {query}")
    print(f"Citation:\n{bibtex if bibtex else 'Not found'}")

File Batch Processing#

You can create a workflow to process a file containing multiple citations. First, create the workflow and add data sources:

from apiModels import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX(email="[email protected]"))
workflow.add_fetcher(DBLPBibTeX())

# Process file
workflow.process_file("papers.txt", "references.bib")

Example Input:
papers.txt

FedMSA: A Model Selection and Adaptation System for Federated Learning
Attention Is All You Need
Non-Local Neural Networks
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
CBAM: Convolutional Block Attention Module

Example Output:
references.bib


% Query: FedMSA: A Model Selection and Adaptation System for Federated Learning
% Source: CrossRefBibTeX
@article{Sun_2022, title={FedMSA: A Model Selection and Adaptation System for Federated Learning}, volume={22}, ISSN={1424-8220}, url={http://dx.doi.org/10.3390/s22197244}, DOI={10.3390/s22197244}, number={19}, journal={Sensors}, publisher={MDPI AG}, author={Sun, Rui and Li, Yinhao and Shah, Tejal and Sham, Ringo W. H. and Szydlo, Tomasz and Qian, Bin and Thakker, Dhaval and Ranjan, Rajiv}, year={2022}, month=sep, pages={7244} }

% Query: Attention Is All You Need
% Source: DBLPBibTeX
@inproceedings{DBLP:conf/dac/ZhangYY21,
  author       = {Xiaopeng Zhang and
                  Haoyu Yang and
                  Evangeline F. Y. Young},
  title        = {Attentional Transfer is All You Need: Technology-aware Layout Pattern
                  Generation},
  booktitle    = {58th {ACM/IEEE} Design Automation Conference, {DAC} 2021, San Francisco,
                  CA, USA, December 5-9, 2021},
  pages        = {169--174},
  publisher    = {{IEEE}},
  year         = {2021},
  url          = {https://doi.org/10.1109/DAC18074.2021.9586227},
  doi          = {10.1109/DAC18074.2021.9586227},
  timestamp    = {Wed, 03 May 2023 17:06:11 +0200},
  biburl       = {https://dblp.org/rec/conf/dac/ZhangYY21.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

% Query: Non-Local Neural Networks
% Source: CrossRefBibTeX
@article{Xu_2024, title={Adaptive selection of local and non-local attention mechanisms for speech enhancement}, volume={174}, ISSN={0893-6080}, url={http://dx.doi.org/10.1016/j.neunet.2024.106236}, DOI={10.1016/j.neunet.2024.106236}, journal={Neural Networks}, publisher={Elsevier BV}, author={Xu, Xinmeng and Tu, Weiping and Yang, Yuhong}, year={2024}, month=jun, pages={106236} }

% Query: ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
% Source: CrossRefBibTeX
@inproceedings{Wang_2020, title={ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks}, url={http://dx.doi.org/10.1109/cvpr42600.2020.01155}, DOI={10.1109/cvpr42600.2020.01155}, booktitle={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, publisher={IEEE}, author={Wang, Qilong and Wu, Banggu and Zhu, Pengfei and Li, Peihua and Zuo, Wangmeng and Hu, Qinghua}, year={2020}, month=jun, pages={11531–11539} }

% Query: CBAM: Convolutional Block Attention Module
% Source: DBLPBibTeX
@article{DBLP:journals/access/WangZHLL24,
  author       = {Niannian Wang and
                  Zexi Zhang and
                  Haobang Hu and
                  Bin Li and
                  Jianwei Lei},
  title        = {Underground Defects Detection Based on {GPR} by Fusing Simple Linear
                  Iterative Clustering Phash (SLIC-Phash) and Convolutional Block Attention
                  Module (CBAM)-YOLOv8},
  journal      = {{IEEE} Access},
  volume       = {12},
  pages        = {25888--25905},
  year         = {2024},
  url          = {https://doi.org/10.1109/ACCESS.2024.3365959},
  doi          = {10.1109/ACCESS.2024.3365959},
  timestamp    = {Sat, 16 Mar 2024 15:09:59 +0100},
  biburl       = {https://dblp.org/rec/journals/access/WangZHLL24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Installation Method#

Using pip:

pip install get-bibtex

Using Poetry:

poetry add get-bibtex

Best Practices#

Register with Email
```
fetcher = CrossRefBibTeX(email="[email protected]")
```
This can provide better API access priority.

Use Workflow Wisely

workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX())  # Primary source
workflow.add_fetcher(DBLPBibTeX())      # Backup source

Add data sources in order of reliability.

Add Delay When Batch Processing
When processing a large number of citations, it is recommended to use the built-in delay mechanism to avoid triggering API limits.
Obtain SerpAPI Key

To use Google Scholar features, you need a SerpAPI key. Here are the steps to obtain it:
1. Register for a SerpAPI account
  - Visit SerpAPI official website
  - Click the "Sign Up" button in the upper right corner
  - Fill in the registration information (email, password, etc.)
2. Choose a suitable plan
  - Free plan: 100 searches per month
  - Paid plans: Choose different levels based on needs
  - For testing and personal use, the free plan is usually sufficient.
3. Obtain API Key
  - After logging in, go to the Dashboard
  - Find your key in the "API Key" section
  - Copy the key for use in your code.
4. Usage Example
```
from apiModels import GoogleScholarBibTeX

# Initialize Google Scholar fetcher
fetcher = GoogleScholarBibTeX(api_key="your-serpapi-key")

# Get citation
bibtex = fetcher.get_bibtex("Deep learning with differential privacy")
print(bibtex)
```
5. Notes
  - Protect your API key and do not share it publicly.
  - Monitor usage to avoid exceeding limits.
  - Set reasonable request intervals (at least 1 second is recommended).
  - Use environment variables to store API keys in production environments.
```
import os

api_key = os.getenv("SERPAPI_KEY")
fetcher = GoogleScholarBibTeX(api_key=api_key)
```
6. Usage Recommendations
  - Prioritize using CrossRef and DBLP.
  - Use Google Scholar only when results are not found.
  - Be mindful of API usage limits when batch processing.
```
# Recommended workflow order
workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX(email="[email protected]"))
workflow.add_fetcher(DBLPBibTeX())
workflow.add_fetcher(GoogleScholarBibTeX(api_key="your-serpapi-key"))
```

Future Prospects#

Support for more data sources
Adding citation format conversion features
Providing a graphical user interface
Supporting more customization options

Conclusion#

get-bibtex is dedicated to simplifying the management of literature in academic writing. Whether for a single paper or a literature review, it can help you efficiently obtain and manage literature citations. You are welcome to participate in project development on GitHub, provide suggestions, or report issues.

GitHub Repository: get-bibtex
Issue Feedback: Issues
PyPI Page: get-bibtex

This article is synchronized and updated to xLog by Mix Space
The original link is https://liuyaowen.cn/posts/person/20241231