Chatbot Evaluation
Hints ▾
Headache remedy
Fever guidance
Dizziness tips
Fatigue steps
Question
Model
DistilBERT QA
RoBERTa QA
ALBERT QA
openai/gpt-5
openai/gpt-4o
openai/gpt-4o-mini
meta-llama/llama-3.1-70b-instruct
meta-llama/llama-3.1-8b-instruct
google/gemini-pro-1.5
google/gemini-flash-1.5
google/gemini-2.5-flash
google/gemini-2.5-pro
microsoft/phi-3-medium-128k-instruct
Phi-1.5
DialoGPT Small
BitNet b1.58 2B-4T
DialoGPT Medium
GPT-2
GPT-2 Medium
DistilGPT2
TinyLlama 1.1B Chat
Qwen2.5 0.5B Instruct
Advanced (QA Hyperparameters)
Top-K Answers (1-5)
Max Answer Length (5-200)
Optional Metrics (may download large models)
BERTScore
BLEURT
COMET
Toxicity
METEOR
Evaluate
Results
Model Response
Metrics
Metric
Score
Hyperparameters Used
Model Info
Raw Provider Response
Show/Hide
Request Echo
Show/Hide