get-bibtex：讓文獻引用管理更輕鬆的 Python 工具

引言#

在學術研究中，管理參考文獻是一項重要但耗時的工作。尤其是在寫論文時，我們經常需要從不同的數據庫中獲取文獻的引用格式。為了解決這個問題，我開發了 get-bibtex 這個 Python 庫，它可以幫助研究者快速從多個學術數據庫獲取 BibTeX 格式的引用。

為什麼選擇 get-bibtex？#

1. 多源支持#

CrossRef（最全面的 DOI 數據庫）
DBLP（計算機科學文獻數據庫）
Google Scholar（需要 API key）

2. 智能工作流#

from get_bibtex import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

# 創建工作流
workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX("[email protected]"))
workflow.add_fetcher(DBLPBibTeX())

# 批量處理
papers = [
    "10.1145/3292500.3330919",  # 用 DOI
    "Attention is all you need"   # 用標題
]
results = workflow.get_multiple_bibtex(papers)

3. 簡單易用#

from get_bibtex import CrossRefBibTeX

# 單個引用獲取
fetcher = CrossRefBibTeX(email="[email protected]")
bibtex = fetcher.get_bibtex("10.1145/3292500.3330919")
print(bibtex)

4. 文件批處理#

# 從文件讀取並保存
workflow.process_file(
    input_path="papers.txt",
    output_path="references.bib"
)

特色功能#

智能回退機制
- 當一個數據源失敗時，自動嘗試其他數據源
- 保證最大程度獲取引用信息
進度追蹤
- 使用 tqdm 顯示處理進度
- 清晰掌握批量處理狀態
錯誤處理
- 詳細的日誌記錄
- 優雅處理 API 限制和網絡錯誤
格式化輸出
- 自動清理和格式化 BibTeX
- 確保輸出格式的一致性

使用場景#

論文寫作#

當你在寫論文時，可以直接用 DOI 或標題獲取引用：

from get_bibtex import CrossRefBibTeX

fetcher = CrossRefBibTeX()
citations = [
    "Machine learning",
    "Deep learning",
    "10.1038/nature14539"
]

for citation in citations:
    bibtex = fetcher.get_bibtex(citation)
    print(bibtex)

文獻綜述#

批量處理大量文獻引用：

from get_bibtex import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX())
workflow.add_fetcher(DBLPBibTeX())

# 從文件讀取文獻列表
workflow.process_file("papers.txt", "bibliography.bib")

獲取注意力機制相關論文的引用#

假設我們需要獲取以下論文的引用：

FedMSA: 聯邦學習中的模型選擇與適應系統
Attention Is All You Need: Transformer 的開創性工作
Non-Local Neural Networks: 非局部神經網絡
ECA-Net: 高效的通道注意力機制
CBAM: 卷積塊注意力模塊

使用 CrossRef 獲取（通過 DOI）#

from apiModels import CrossRefBibTeX

fetcher = CrossRefBibTeX(email="[email protected]")

# FedMSA
bibtex = fetcher.get_bibtex("10.3390/s22197244")
print(bibtex)

# ECA-Net
bibtex = fetcher.get_bibtex("10.1109/cvpr42600.2020.01155")
print(bibtex)

輸出示例：

@article{Sun_2022,
  title={FedMSA: A Model Selection and Adaptation System for Federated Learning},
  volume={22},
  ISSN={1424-8220},
  url={http://dx.doi.org/10.3390/s22197244},
  DOI={10.3390/s22197244},
  number={19},
  journal={Sensors},
  publisher={MDPI AG},
  author={Sun, Rui and Li, Yinhao and Shah, Tejal and Sham, Ringo W. H. and Szydlo, Tomasz and Qian, Bin and Thakker, Dhaval and Ranjan, Rajiv},
  year={2022},
  month=sep,
  pages={7244}
}

使用 DBLP 獲取（通過標題）#

from apiModels import DBLPBibTeX

fetcher = DBLPBibTeX()

# CBAM
bibtex = fetcher.get_bibtex("CBAM: Convolutional Block Attention Module")
print(bibtex)

輸出示例：

@article{DBLP:journals/access/WangZHLL24,
  author       = {Niannian Wang and Zexi Zhang and Haobang Hu and Bin Li and Jianwei Lei},
  title        = {Underground Defects Detection Based on {GPR} by Fusing Simple Linear Iterative Clustering Phash (SLIC-Phash) and Convolutional Block Attention Module (CBAM)-YOLOv8},
  journal      = {{IEEE} Access},
  volume       = {12},
  pages        = {25888--25905},
  year         = {2024},
  url          = {https://doi.org/10.1109/ACCESS.2024.3365959},
  doi          = {10.1109/ACCESS.2024.3365959}
}

使用工作流獲取多個引用#

from apiModels import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

# 創建工作流
workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX(email="[email protected]"))
workflow.add_fetcher(DBLPBibTeX())

# 準備查詢列表
queries = [
    "FedMSA: A Model Selection and Adaptation System for Federated Learning",
    "Attention Is All You Need",
    "Non-Local Neural Networks",
    "ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks",
    "CBAM: Convolutional Block Attention Module"
]

# 獲取所有引用
results = workflow.get_multiple_bibtex(queries)

# 打印結果
for query, bibtex in results.items():
    print(f"\n查詢: {query}")
    print(f"引用:\n{bibtex if bibtex else '未找到'}")

文件批處理#

你可以創建一個工作流來處理包含多個引用的文件。首先，創建工作流並添加數據源:

from apiModels import WorkflowBuilder, CrossRefBibTeX, DBLPBibTeX

workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX(email="[email protected]"))
workflow.add_fetcher(DBLPBibTeX())

# 處理文件
workflow.process_file("papers.txt", "references.bib")

輸入示例：
papers.txt

FedMSA: A Model Selection and Adaptation System for Federated Learning
Attention Is All You Need
Non-Local Neural Networks
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
CBAM: Convolutional Block Attention Module

輸出示例：
references.bib


% 查詢: FedMSA: A Model Selection and Adaptation System for Federated Learning
% 來源: CrossRefBibTeX
@article{Sun_2022, title={FedMSA: A Model Selection and Adaptation System for Federated Learning}, volume={22}, ISSN={1424-8220}, url={http://dx.doi.org/10.3390/s22197244}, DOI={10.3390/s22197244}, number={19}, journal={Sensors}, publisher={MDPI AG}, author={Sun, Rui and Li, Yinhao and Shah, Tejal and Sham, Ringo W. H. and Szydlo, Tomasz and Qian, Bin and Thakker, Dhaval and Ranjan, Rajiv}, year={2022}, month=sep, pages={7244} }

% 查詢: Attention Is All You Need
% 來源: DBLPBibTeX
@inproceedings{DBLP:conf/dac/ZhangYY21,
  author       = {Xiaopeng Zhang and
                  Haoyu Yang and
                  Evangeline F. Y. Young},
  title        = {Attentional Transfer is All You Need: Technology-aware Layout Pattern
                  Generation},
  booktitle    = {58th {ACM/IEEE} Design Automation Conference, {DAC} 2021, San Francisco,
                  CA, USA, December 5-9, 2021},
  pages        = {169--174},
  publisher    = {{IEEE}},
  year         = {2021},
  url          = {https://doi.org/10.1109/DAC18074.2021.9586227},
  doi          = {10.1109/DAC18074.2021.9586227},
  timestamp    = {Wed, 03 May 2023 17:06:11 +0200},
  biburl       = {https://dblp.org/rec/conf/dac/ZhangYY21.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

% 查詢: Non-Local Neural Networks
% 來源: CrossRefBibTeX
@article{Xu_2024, title={Adaptive selection of local and non-local attention mechanisms for speech enhancement}, volume={174}, ISSN={0893-6080}, url={http://dx.doi.org/10.1016/j.neunet.2024.106236}, DOI={10.1016/j.neunet.2024.106236}, journal={Neural Networks}, publisher={Elsevier BV}, author={Xu, Xinmeng and Tu, Weiping and Yang, Yuhong}, year={2024}, month=jun, pages={106236} }

% 查詢: ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
% 來源: CrossRefBibTeX
@inproceedings{Wang_2020, title={ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks}, url={http://dx.doi.org/10.1109/cvpr42600.2020.01155}, DOI={10.1109/cvpr42600.2020.01155}, booktitle={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, publisher={IEEE}, author={Wang, Qilong and Wu, Banggu and Zhu, Pengfei and Li, Peihua and Zuo, Wangmeng and Hu, Qinghua}, year={2020}, month=jun, pages={11531–11539} }

% 查詢: CBAM: Convolutional Block Attention Module
% 來源: DBLPBibTeX
@article{DBLP:journals/access/WangZHLL24,
  author       = {Niannian Wang and
                  Zexi Zhang and
                  Haobang Hu and
                  Bin Li and
                  Jianwei Lei},
  title        = {Underground Defects Detection Based on {GPR} by Fusing Simple Linear
                  Iterative Clustering Phash (SLIC-Phash) and Convolutional Block Attention
                  Module (CBAM)-YOLOv8},
  journal      = {{IEEE} Access},
  volume       = {12},
  pages        = {25888--25905},
  year         = {2024},
  url          = {https://doi.org/10.1109/ACCESS.2024.3365959},
  doi          = {10.1109/ACCESS.2024.3365959},
  timestamp    = {Sat, 16 Mar 2024 15:09:59 +0100},
  biburl       = {https://dblp.org/rec/journals/access/WangZHLL24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

安裝方式#

使用 pip：

pip install get-bibtex

使用 Poetry：

poetry add get-bibtex

最佳實踐#

使用郵箱註冊
```
fetcher = CrossRefBibTeX(email="[email protected]")
```
這樣可以獲得更好的 API 訪問優先級

合理使用工作流

workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX())  # 主要源
workflow.add_fetcher(DBLPBibTeX())      # 備用源

按照可靠性順序添加數據源

批量處理時添加延時
處理大量引用時，建議使用內置的延時機制，避免觸發 API 限制
獲取 SerpAPI Key

要使用 Google Scholar 功能，你需要一個 SerpAPI key。以下是獲取步驟：
1. 註冊 SerpAPI 賬號
  - 訪問 SerpAPI 官網
  - 點擊右上角的 "Sign Up" 按鈕
  - 填寫註冊信息（郵箱、密碼等）
2. 選擇合適的計劃
  - 免費計劃：每月 100 次搜索
  - 付費計劃：根據需求選擇不同級別
  - 對於測試和個人使用，免費計劃通常足夠
3. 獲取 API Key
  - 登錄後進入 Dashboard
  - 在 "API Key" 部分找到你的密鑰
  - 複製密鑰以在代碼中使用
4. 使用示例
```
from apiModels import GoogleScholarBibTeX

# 初始化 Google Scholar 獲取器
fetcher = GoogleScholarBibTeX(api_key="your-serpapi-key")

# 獲取引用
bibtex = fetcher.get_bibtex("Deep learning with differential privacy")
print(bibtex)
```
5. 注意事項
  - 保護好你的 API key，不要公開分享
  - 監控使用量，避免超出限制
  - 合理設置請求間隔（建議至少 1 秒）
  - 在生產環境中使用環境變量存儲 API key
```
import os

api_key = os.getenv("SERPAPI_KEY")
fetcher = GoogleScholarBibTeX(api_key=api_key)
```
6. 使用建議
  - 優先使用 CrossRef 和 DBLP
  - 只在找不到結果時使用 Google Scholar
  - 批量處理時注意 API 使用限制
```
# 推薦的工作流順序
workflow = WorkflowBuilder()
workflow.add_fetcher(CrossRefBibTeX(email="[email protected]"))
workflow.add_fetcher(DBLPBibTeX())
workflow.add_fetcher(GoogleScholarBibTeX(api_key="your-serpapi-key"))
```

未來展望#

支持更多數據源
添加引用格式轉換功能
提供圖形用戶界面
支持更多自定義選項

結語#

get-bibtex 致力於簡化學術寫作中的文獻管理工作。無論是單篇論文還是文獻綜述，它都能幫助你高效地獲取和管理文獻引用。歡迎通過 GitHub 參與項目開發，提出建議或反饋問題。

刘耀文