Char :: 모델 성능 평가 - 데이터 분류

모델 성능 평가 - 데이터 분류 - 세이커 블루(sacreBLEU)

07.AI 2024. 5. 14. 00:53

728x90

(개념) 원래 언어 번역을 테스트하는 데 사용되던 방법으로, 현재 TER, ChrF, BERTScore 등의 다른 방법과 함께 LLM 응답의 정량적 평가에 사용된다.

GitHub - mjpost/sacrebleu: Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitat

Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons - mjpost/sacrebleu

github.com

728x90

LLM - LMM (대규모 멀티모달 모델, Large Multimodal Model) (0)	2024.05.21
인공지능 - 범용 일반 지능(AGI) - 멀티모달 AI 경쟁과 다가오는 AGI (0)	2024.05.21
모델 성능 평가 - 데이터 분류 - 루즈 L(Rouge L) (0)	2024.05.14
모델 성능 평가 - 데이터 분류 - F1 점수 (0)	2024.05.14
모델 성능 평가 - 데이터 분류 - ROC 커브 (0)	2024.05.14