Dan's Brain

Parliamo di Large Language Models con Enkk

Notes

  • data leakage from the test-sets into LLM training
    • this skews the benchmarks
  • Gemini
    • benchmarks w/ MMLU
      • multiple choice tests are entirely different from human interaction and the way these models are used
      • uncertainty-routed chain-of-thought
      • CoT@32
        • the question is asked N times and the majority wins
    • chain-of-thought
      • ask the model to generate more text, steps
      • the steps are sharper and the output gets more accurate
      • could be N-shot or Zero-shot
  • leader-board benchmark
    • users vote for winner in an anonymous match between models for a prompt
    • elo system
  • what are these systems for:
    • supervised data transformation