刘耀文

刘耀文

java开发者
github

get-bibtex: A Python tool that makes managing literature citations easier

Introduction#

In academic research, managing references is an important but time-consuming task. Especially when writing papers, we often need to obtain citation formats from different databases. To solve this problem, I developed the get-bibtex Python library, which helps researchers quickly obtain BibTeX format citations from multiple academic databases.

Why Choose get-bibtex?#

1. Multi-source Support#

  • CrossRef (the most comprehensive DOI database)
  • DBLP (computer science literature database)
  • Google Scholar (requires API key)

2. Intelligent Workflow#

from get_bibtex import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

# Create workflow
workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX("[email protected]"))
workflow.add_fetcher(DBLPBibTeX())

# Batch processing
papers = [
    "10.1145/3292500.3330919",  # Using DOI
    "Attention is all you need"   # Using title
]
results = workflow.get_multiple_bibtex(papers)

3. Simple and Easy to Use#

from get_bibtex import CrossRefBibTeX

# Single citation retrieval
fetcher = CrossRefBibTeX(email="[email protected]")
bibtex = fetcher.get_bibtex("10.1145/3292500.3330919")
print(bibtex)

4. File Batch Processing#

# Read from file and save
workflow.process_file(
    input_path="papers.txt",
    output_path="references.bib"
)
  1. Intelligent Fallback Mechanism

    • Automatically tries other data sources when one fails
    • Ensures maximum retrieval of citation information
  2. Progress Tracking

    • Displays processing progress using tqdm
    • Clearly understand the status of batch processing
  3. Error Handling

    • Detailed logging
    • Gracefully handles API limits and network errors
  4. Formatted Output

    • Automatically cleans and formats BibTeX
    • Ensures consistency of output format

Use Cases#

Paper Writing#

When you are writing a paper, you can directly obtain citations using DOI or title:

from get_bibtex import CrossRefBibTeX

fetcher = CrossRefBibTeX()
citations = [
    "Machine learning",
    "Deep learning",
    "10.1038/nature14539"
]

for citation in citations:
    bibtex = fetcher.get_bibtex(citation)
    print(bibtex)

Literature Review#

Batch process a large number of literature citations:

from get_bibtex import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX())
workflow.add_fetcher(DBLPBibTeX())

# Read literature list from file
workflow.process_file("papers.txt", "bibliography.bib")

Suppose we need to obtain citations for the following papers:

  1. FedMSA: A Model Selection and Adaptation System for Federated Learning
  2. Attention Is All You Need: A Pioneering Work on Transformers
  3. Non-Local Neural Networks
  4. ECA-Net: Efficient Channel Attention Mechanism
  5. CBAM: Convolutional Block Attention Module

Using CrossRef to Retrieve (via DOI)#

from apiModels import CrossRefBibTeX

fetcher = CrossRefBibTeX(email="[email protected]")

# FedMSA
bibtex = fetcher.get_bibtex("10.3390/s22197244")
print(bibtex)

# ECA-Net
bibtex = fetcher.get_bibtex("10.1109/cvpr42600.2020.01155")
print(bibtex)

Example Output:

@article{Sun_2022,
  title={FedMSA: A Model Selection and Adaptation System for Federated Learning},
  volume={22},
  ISSN={1424-8220},
  url={http://dx.doi.org/10.3390/s22197244},
  DOI={10.3390/s22197244},
  number={19},
  journal={Sensors},
  publisher={MDPI AG},
  author={Sun, Rui and Li, Yinhao and Shah, Tejal and Sham, Ringo W. H. and Szydlo, Tomasz and Qian, Bin and Thakker, Dhaval and Ranjan, Rajiv},
  year={2022},
  month=sep,
  pages={7244}
}

Using DBLP to Retrieve (via Title)#

from apiModels import DBLPBibTeX

fetcher = DBLPBibTeX()

# CBAM
bibtex = fetcher.get_bibtex("CBAM: Convolutional Block Attention Module")
print(bibtex)

Example Output:

@article{DBLP:journals/access/WangZHLL24,
  author       = {Niannian Wang and Zexi Zhang and Haobang Hu and Bin Li and Jianwei Lei},
  title        = {Underground Defects Detection Based on {GPR} by Fusing Simple Linear Iterative Clustering Phash (SLIC-Phash) and Convolutional Block Attention Module (CBAM)-YOLOv8},
  journal      = {{IEEE} Access},
  volume       = {12},
  pages        = {25888--25905},
  year         = {2024},
  url          = {https://doi.org/10.1109/ACCESS.2024.3365959},
  doi          = {10.1109/ACCESS.2024.3365959}
}

Using Workflow to Retrieve Multiple Citations#

from apiModels import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

# Create workflow
workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX(email="[email protected]"))
workflow.add_fetcher(DBLPBibTeX())

# Prepare query list
queries = [
    "FedMSA: A Model Selection and Adaptation System for Federated Learning",
    "Attention Is All You Need",
    "Non-Local Neural Networks",
    "ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks",
    "CBAM: Convolutional Block Attention Module"
]

# Retrieve all citations
results = workflow.get_multiple_bibtex(queries)

# Print results
for query, bibtex in results.items():
    print(f"\nQuery: {query}")
    print(f"Citation:\n{bibtex if bibtex else 'Not found'}")

File Batch Processing#

You can create a workflow to process a file containing multiple citations. First, create the workflow and add data sources:

from apiModels import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX(email="[email protected]"))
workflow.add_fetcher(DBLPBibTeX())

# Process file
workflow.process_file("papers.txt", "references.bib")

Example Input:
papers.txt

FedMSA: A Model Selection and Adaptation System for Federated Learning
Attention Is All You Need
Non-Local Neural Networks
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
CBAM: Convolutional Block Attention Module

Example Output:
references.bib


% Query: FedMSA: A Model Selection and Adaptation System for Federated Learning
% Source: CrossRefBibTeX
@article{Sun_2022, title={FedMSA: A Model Selection and Adaptation System for Federated Learning}, volume={22}, ISSN={1424-8220}, url={http://dx.doi.org/10.3390/s22197244}, DOI={10.3390/s22197244}, number={19}, journal={Sensors}, publisher={MDPI AG}, author={Sun, Rui and Li, Yinhao and Shah, Tejal and Sham, Ringo W. H. and Szydlo, Tomasz and Qian, Bin and Thakker, Dhaval and Ranjan, Rajiv}, year={2022}, month=sep, pages={7244} }

% Query: Attention Is All You Need
% Source: DBLPBibTeX
@inproceedings{DBLP:conf/dac/ZhangYY21,
  author       = {Xiaopeng Zhang and
                  Haoyu Yang and
                  Evangeline F. Y. Young},
  title        = {Attentional Transfer is All You Need: Technology-aware Layout Pattern
                  Generation},
  booktitle    = {58th {ACM/IEEE} Design Automation Conference, {DAC} 2021, San Francisco,
                  CA, USA, December 5-9, 2021},
  pages        = {169--174},
  publisher    = {{IEEE}},
  year         = {2021},
  url          = {https://doi.org/10.1109/DAC18074.2021.9586227},
  doi          = {10.1109/DAC18074.2021.9586227},
  timestamp    = {Wed, 03 May 2023 17:06:11 +0200},
  biburl       = {https://dblp.org/rec/conf/dac/ZhangYY21.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

% Query: Non-Local Neural Networks
% Source: CrossRefBibTeX
@article{Xu_2024, title={Adaptive selection of local and non-local attention mechanisms for speech enhancement}, volume={174}, ISSN={0893-6080}, url={http://dx.doi.org/10.1016/j.neunet.2024.106236}, DOI={10.1016/j.neunet.2024.106236}, journal={Neural Networks}, publisher={Elsevier BV}, author={Xu, Xinmeng and Tu, Weiping and Yang, Yuhong}, year={2024}, month=jun, pages={106236} }

% Query: ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
% Source: CrossRefBibTeX
@inproceedings{Wang_2020, title={ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks}, url={http://dx.doi.org/10.1109/cvpr42600.2020.01155}, DOI={10.1109/cvpr42600.2020.01155}, booktitle={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, publisher={IEEE}, author={Wang, Qilong and Wu, Banggu and Zhu, Pengfei and Li, Peihua and Zuo, Wangmeng and Hu, Qinghua}, year={2020}, month=jun, pages={11531–11539} }

% Query: CBAM: Convolutional Block Attention Module
% Source: DBLPBibTeX
@article{DBLP:journals/access/WangZHLL24,
  author       = {Niannian Wang and
                  Zexi Zhang and
                  Haobang Hu and
                  Bin Li and
                  Jianwei Lei},
  title        = {Underground Defects Detection Based on {GPR} by Fusing Simple Linear
                  Iterative Clustering Phash (SLIC-Phash) and Convolutional Block Attention
                  Module (CBAM)-YOLOv8},
  journal      = {{IEEE} Access},
  volume       = {12},
  pages        = {25888--25905},
  year         = {2024},
  url          = {https://doi.org/10.1109/ACCESS.2024.3365959},
  doi          = {10.1109/ACCESS.2024.3365959},
  timestamp    = {Sat, 16 Mar 2024 15:09:59 +0100},
  biburl       = {https://dblp.org/rec/journals/access/WangZHLL24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}


Installation Method#

Using pip:

pip install get-bibtex

Using Poetry:

poetry add get-bibtex

Best Practices#

  1. Register with Email

    fetcher = CrossRefBibTeX(email="[email protected]")
    

    This can provide better API access priority.

  2. Use Workflow Wisely

    workflow = WorkflowBuilder()
    workflow.add_fetcher(CrossRefBibTeX())  # Primary source
    workflow.add_fetcher(DBLPBibTeX())      # Backup source
    

    Add data sources in order of reliability.

  3. Add Delay When Batch Processing
    When processing a large number of citations, it is recommended to use the built-in delay mechanism to avoid triggering API limits.

  4. Obtain SerpAPI Key

    To use Google Scholar features, you need a SerpAPI key. Here are the steps to obtain it:

    1. Register for a SerpAPI account

      • Visit SerpAPI official website
      • Click the "Sign Up" button in the upper right corner
      • Fill in the registration information (email, password, etc.)
    2. Choose a suitable plan

      • Free plan: 100 searches per month
      • Paid plans: Choose different levels based on needs
      • For testing and personal use, the free plan is usually sufficient.
    3. Obtain API Key

      • After logging in, go to the Dashboard
      • Find your key in the "API Key" section
      • Copy the key for use in your code.
    4. Usage Example

      from apiModels import GoogleScholarBibTeX
      
      # Initialize Google Scholar fetcher
      fetcher = GoogleScholarBibTeX(api_key="your-serpapi-key")
      
      # Get citation
      bibtex = fetcher.get_bibtex("Deep learning with differential privacy")
      print(bibtex)
      
    5. Notes

      • Protect your API key and do not share it publicly.
      • Monitor usage to avoid exceeding limits.
      • Set reasonable request intervals (at least 1 second is recommended).
      • Use environment variables to store API keys in production environments.
      import os
      
      api_key = os.getenv("SERPAPI_KEY")
      fetcher = GoogleScholarBibTeX(api_key=api_key)
      
    6. Usage Recommendations

      • Prioritize using CrossRef and DBLP.
      • Use Google Scholar only when results are not found.
      • Be mindful of API usage limits when batch processing.
      # Recommended workflow order
      workflow = WorkflowBuilder()
      workflow.add_fetcher(CrossRefBibTeX(email="[email protected]"))
      workflow.add_fetcher(DBLPBibTeX())
      workflow.add_fetcher(GoogleScholarBibTeX(api_key="your-serpapi-key"))
      

Future Prospects#

  1. Support for more data sources
  2. Adding citation format conversion features
  3. Providing a graphical user interface
  4. Supporting more customization options

Conclusion#

get-bibtex is dedicated to simplifying the management of literature in academic writing. Whether for a single paper or a literature review, it can help you efficiently obtain and manage literature citations. You are welcome to participate in project development on GitHub, provide suggestions, or report issues.

This article is synchronized and updated to xLog by Mix Space
The original link is https://liuyaowen.cn/posts/person/20241231


Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.