刘耀文

刘耀文

java开发者
github

get-bibtex:讓文獻引用管理更輕鬆的 Python 工具

引言#

在學術研究中,管理參考文獻是一項重要但耗時的工作。尤其是在寫論文時,我們經常需要從不同的數據庫中獲取文獻的引用格式。為了解決這個問題,我開發了 get-bibtex 這個 Python 庫,它可以幫助研究者快速從多個學術數據庫獲取 BibTeX 格式的引用。

為什麼選擇 get-bibtex?#

1. 多源支持#

  • CrossRef(最全面的 DOI 數據庫)
  • DBLP(計算機科學文獻數據庫)
  • Google Scholar(需要 API key)

2. 智能工作流#

from get_bibtex import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

# 創建工作流
workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX("[email protected]"))
workflow.add_fetcher(DBLPBibTeX())

# 批量處理
papers = [
    "10.1145/3292500.3330919",  # 用 DOI
    "Attention is all you need"   # 用標題
]
results = workflow.get_multiple_bibtex(papers)

3. 簡單易用#

from get_bibtex import CrossRefBibTeX

# 單個引用獲取
fetcher = CrossRefBibTeX(email="[email protected]")
bibtex = fetcher.get_bibtex("10.1145/3292500.3330919")
print(bibtex)

4. 文件批處理#

# 從文件讀取並保存
workflow.process_file(
    input_path="papers.txt",
    output_path="references.bib"
)

特色功能#

  1. 智能回退機制

    • 當一個數據源失敗時,自動嘗試其他數據源
    • 保證最大程度獲取引用信息
  2. 進度追蹤

    • 使用 tqdm 顯示處理進度
    • 清晰掌握批量處理狀態
  3. 錯誤處理

    • 詳細的日誌記錄
    • 優雅處理 API 限制和網絡錯誤
  4. 格式化輸出

    • 自動清理和格式化 BibTeX
    • 確保輸出格式的一致性

使用場景#

論文寫作#

當你在寫論文時,可以直接用 DOI 或標題獲取引用:

from get_bibtex import CrossRefBibTeX

fetcher = CrossRefBibTeX()
citations = [
    "Machine learning",
    "Deep learning",
    "10.1038/nature14539"
]

for citation in citations:
    bibtex = fetcher.get_bibtex(citation)
    print(bibtex)

文獻綜述#

批量處理大量文獻引用:

from get_bibtex import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX())
workflow.add_fetcher(DBLPBibTeX())

# 從文件讀取文獻列表
workflow.process_file("papers.txt", "bibliography.bib")

獲取注意力機制相關論文的引用#

假設我們需要獲取以下論文的引用:

  1. FedMSA: 聯邦學習中的模型選擇與適應系統
  2. Attention Is All You Need: Transformer 的開創性工作
  3. Non-Local Neural Networks: 非局部神經網絡
  4. ECA-Net: 高效的通道注意力機制
  5. CBAM: 卷積塊注意力模塊

使用 CrossRef 獲取(通過 DOI)#

from apiModels import CrossRefBibTeX

fetcher = CrossRefBibTeX(email="[email protected]")

# FedMSA
bibtex = fetcher.get_bibtex("10.3390/s22197244")
print(bibtex)

# ECA-Net
bibtex = fetcher.get_bibtex("10.1109/cvpr42600.2020.01155")
print(bibtex)

輸出示例:

@article{Sun_2022,
  title={FedMSA: A Model Selection and Adaptation System for Federated Learning},
  volume={22},
  ISSN={1424-8220},
  url={http://dx.doi.org/10.3390/s22197244},
  DOI={10.3390/s22197244},
  number={19},
  journal={Sensors},
  publisher={MDPI AG},
  author={Sun, Rui and Li, Yinhao and Shah, Tejal and Sham, Ringo W. H. and Szydlo, Tomasz and Qian, Bin and Thakker, Dhaval and Ranjan, Rajiv},
  year={2022},
  month=sep,
  pages={7244}
}

使用 DBLP 獲取(通過標題)#

from apiModels import DBLPBibTeX

fetcher = DBLPBibTeX()

# CBAM
bibtex = fetcher.get_bibtex("CBAM: Convolutional Block Attention Module")
print(bibtex)

輸出示例:

@article{DBLP:journals/access/WangZHLL24,
  author       = {Niannian Wang and Zexi Zhang and Haobang Hu and Bin Li and Jianwei Lei},
  title        = {Underground Defects Detection Based on {GPR} by Fusing Simple Linear Iterative Clustering Phash (SLIC-Phash) and Convolutional Block Attention Module (CBAM)-YOLOv8},
  journal      = {{IEEE} Access},
  volume       = {12},
  pages        = {25888--25905},
  year         = {2024},
  url          = {https://doi.org/10.1109/ACCESS.2024.3365959},
  doi          = {10.1109/ACCESS.2024.3365959}
}

使用工作流獲取多個引用#

from apiModels import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

# 創建工作流
workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX(email="[email protected]"))
workflow.add_fetcher(DBLPBibTeX())

# 準備查詢列表
queries = [
    "FedMSA: A Model Selection and Adaptation System for Federated Learning",
    "Attention Is All You Need",
    "Non-Local Neural Networks",
    "ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks",
    "CBAM: Convolutional Block Attention Module"
]

# 獲取所有引用
results = workflow.get_multiple_bibtex(queries)

# 打印結果
for query, bibtex in results.items():
    print(f"\n查詢: {query}")
    print(f"引用:\n{bibtex if bibtex else '未找到'}")

文件批處理#

你可以創建一個工作流來處理包含多個引用的文件。首先,創建工作流並添加數據源:

from apiModels import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX(email="[email protected]"))
workflow.add_fetcher(DBLPBibTeX())

# 處理文件
workflow.process_file("papers.txt", "references.bib")

輸入示例:
papers.txt

FedMSA: A Model Selection and Adaptation System for Federated Learning
Attention Is All You Need
Non-Local Neural Networks
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
CBAM: Convolutional Block Attention Module

輸出示例:
references.bib


% 查詢: FedMSA: A Model Selection and Adaptation System for Federated Learning
% 來源: CrossRefBibTeX
@article{Sun_2022, title={FedMSA: A Model Selection and Adaptation System for Federated Learning}, volume={22}, ISSN={1424-8220}, url={http://dx.doi.org/10.3390/s22197244}, DOI={10.3390/s22197244}, number={19}, journal={Sensors}, publisher={MDPI AG}, author={Sun, Rui and Li, Yinhao and Shah, Tejal and Sham, Ringo W. H. and Szydlo, Tomasz and Qian, Bin and Thakker, Dhaval and Ranjan, Rajiv}, year={2022}, month=sep, pages={7244} }

% 查詢: Attention Is All You Need
% 來源: DBLPBibTeX
@inproceedings{DBLP:conf/dac/ZhangYY21,
  author       = {Xiaopeng Zhang and
                  Haoyu Yang and
                  Evangeline F. Y. Young},
  title        = {Attentional Transfer is All You Need: Technology-aware Layout Pattern
                  Generation},
  booktitle    = {58th {ACM/IEEE} Design Automation Conference, {DAC} 2021, San Francisco,
                  CA, USA, December 5-9, 2021},
  pages        = {169--174},
  publisher    = {{IEEE}},
  year         = {2021},
  url          = {https://doi.org/10.1109/DAC18074.2021.9586227},
  doi          = {10.1109/DAC18074.2021.9586227},
  timestamp    = {Wed, 03 May 2023 17:06:11 +0200},
  biburl       = {https://dblp.org/rec/conf/dac/ZhangYY21.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

% 查詢: Non-Local Neural Networks
% 來源: CrossRefBibTeX
@article{Xu_2024, title={Adaptive selection of local and non-local attention mechanisms for speech enhancement}, volume={174}, ISSN={0893-6080}, url={http://dx.doi.org/10.1016/j.neunet.2024.106236}, DOI={10.1016/j.neunet.2024.106236}, journal={Neural Networks}, publisher={Elsevier BV}, author={Xu, Xinmeng and Tu, Weiping and Yang, Yuhong}, year={2024}, month=jun, pages={106236} }

% 查詢: ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
% 來源: CrossRefBibTeX
@inproceedings{Wang_2020, title={ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks}, url={http://dx.doi.org/10.1109/cvpr42600.2020.01155}, DOI={10.1109/cvpr42600.2020.01155}, booktitle={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, publisher={IEEE}, author={Wang, Qilong and Wu, Banggu and Zhu, Pengfei and Li, Peihua and Zuo, Wangmeng and Hu, Qinghua}, year={2020}, month=jun, pages={11531–11539} }

% 查詢: CBAM: Convolutional Block Attention Module
% 來源: DBLPBibTeX
@article{DBLP:journals/access/WangZHLL24,
  author       = {Niannian Wang and
                  Zexi Zhang and
                  Haobang Hu and
                  Bin Li and
                  Jianwei Lei},
  title        = {Underground Defects Detection Based on {GPR} by Fusing Simple Linear
                  Iterative Clustering Phash (SLIC-Phash) and Convolutional Block Attention
                  Module (CBAM)-YOLOv8},
  journal      = {{IEEE} Access},
  volume       = {12},
  pages        = {25888--25905},
  year         = {2024},
  url          = {https://doi.org/10.1109/ACCESS.2024.3365959},
  doi          = {10.1109/ACCESS.2024.3365959},
  timestamp    = {Sat, 16 Mar 2024 15:09:59 +0100},
  biburl       = {https://dblp.org/rec/journals/access/WangZHLL24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}


安裝方式#

使用 pip:

pip install get-bibtex

使用 Poetry:

poetry add get-bibtex

最佳實踐#

  1. 使用郵箱註冊

    fetcher = CrossRefBibTeX(email="[email protected]")
    

    這樣可以獲得更好的 API 訪問優先級

  2. 合理使用工作流

    workflow = WorkflowBuilder()
    workflow.add_fetcher(CrossRefBibTeX())  # 主要源
    workflow.add_fetcher(DBLPBibTeX())      # 備用源
    

    按照可靠性順序添加數據源

  3. 批量處理時添加延時
    處理大量引用時,建議使用內置的延時機制,避免觸發 API 限制

  4. 獲取 SerpAPI Key

    要使用 Google Scholar 功能,你需要一個 SerpAPI key。以下是獲取步驟:

    1. 註冊 SerpAPI 賬號

      • 訪問 SerpAPI 官網
      • 點擊右上角的 "Sign Up" 按鈕
      • 填寫註冊信息(郵箱、密碼等)
    2. 選擇合適的計劃

      • 免費計劃:每月 100 次搜索
      • 付費計劃:根據需求選擇不同級別
      • 對於測試和個人使用,免費計劃通常足夠
    3. 獲取 API Key

      • 登錄後進入 Dashboard
      • 在 "API Key" 部分找到你的密鑰
      • 複製密鑰以在代碼中使用
    4. 使用示例

      from apiModels import GoogleScholarBibTeX
      
      # 初始化 Google Scholar 獲取器
      fetcher = GoogleScholarBibTeX(api_key="your-serpapi-key")
      
      # 獲取引用
      bibtex = fetcher.get_bibtex("Deep learning with differential privacy")
      print(bibtex)
      
    5. 注意事項

      • 保護好你的 API key,不要公開分享
      • 監控使用量,避免超出限制
      • 合理設置請求間隔(建議至少 1 秒)
      • 在生產環境中使用環境變量存儲 API key
      import os
      
      api_key = os.getenv("SERPAPI_KEY")
      fetcher = GoogleScholarBibTeX(api_key=api_key)
      
    6. 使用建議

      • 優先使用 CrossRef 和 DBLP
      • 只在找不到結果時使用 Google Scholar
      • 批量處理時注意 API 使用限制
      # 推薦的工作流順序
      workflow = WorkflowBuilder()
      workflow.add_fetcher(CrossRefBibTeX(email="[email protected]"))
      workflow.add_fetcher(DBLPBibTeX())
      workflow.add_fetcher(GoogleScholarBibTeX(api_key="your-serpapi-key"))
      

未來展望#

  1. 支持更多數據源
  2. 添加引用格式轉換功能
  3. 提供圖形用戶界面
  4. 支持更多自定義選項

結語#

get-bibtex 致力於簡化學術寫作中的文獻管理工作。無論是單篇論文還是文獻綜述,它都能幫助你高效地獲取和管理文獻引用。歡迎通過 GitHub 參與項目開發,提出建議或反饋問題。

相關鏈接#

此文由 Mix Space 同步更新至 xLog
原始鏈接為 https://liuyaowen.cn/posts/person/20241231


載入中......
此文章數據所有權由區塊鏈加密技術和智能合約保障僅歸創作者所有。