Language Model Evaluator - Fully Remote | Upto $20/hr Part-time

United KingdomRemotecontract

<h3>About the job</h3><p><strong>Mercor</strong> connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include <strong>Benchmark</strong>, <strong>General Catalyst</strong>, <strong>Peter Thiel</strong>, <strong>Adam D'Angelo</strong>, <strong>Larry Summers</strong>, and <strong>Jack Dorsey</strong>.</p><p><strong>Position:</strong> Generalist - English &amp; Bengali<br><strong>Type:</strong><strong>Contract</strong><br><strong>Compensation:</strong><strong>$15–$20/hour</strong><br><strong>Location:</strong><strong>Remote</strong></p><h3>Role Responsibilities</h3><ul><li>Conduct fact-checking using trusted public sources and external <strong>tools</strong>.</li><li>Generate high-quality human evaluation data by identifying response strengths, areas for improvement, and factual inaccuracies.</li><li>Assess reasoning quality, clarity, tone, and completeness of responses.</li><li>Ensure model responses align with expected conversational behavior and system guidelines.</li><li>Work <strong>independently and asynchronously</strong> to meet deadlines while improving <strong>AI model performance</strong>.</li></ul><h3>Qualifications<p></p><p><strong>Must-Have</strong></p></h3><ul><li><strong><strong>Bachelor's degree</strong></strong></li><li><strong><strong>Native speaker</strong> in <strong>Bengali</strong></strong></li><li><strong><strong>Significant experience using large language models</strong> (LLMs)</strong></li><li><strong><strong>Excellent writing skills in English</strong></strong></li><li><strong><strong>Strong attention to detail</strong></strong></li><li><strong>Background or experience in domains requiring <strong>structured analytical thinking</strong> (e.g., research, policy, analytics, linguistics, engineering)</strong></li></ul><h3><strong>Preferred</strong></h3><ul><li><strong>Prior experience with <strong>RLHF, model evaluation, or data annotation work</strong></strong></li><li><strong>Experience writing or editing <strong>high-quality written content</strong></strong></li><li><strong>Experience comparing multiple outputs and making <strong>fine-grained qualitative judgments</strong></strong></li></ul><h3><strong>Application Process (Takes 20–30 mins to complete)</strong></h3><ul><li><strong>Upload resume</strong></li><li><strong>AI interview based on your resume</strong></li><li><strong>Submit form</strong></li></ul><h3><strong>Resources &amp; Support</strong></h3><ul><li><strong>For details about the interview process and platform information, please check: https://talent.docs.<a href="https://himalayas.app/companies/mercor">mercor</a>.com/welcome</strong></li><li><strong>For any help or support, reach out to: support@<a href="https://himalayas.app/companies/mercor">mercor</a>.com</strong></li></ul><p><strong><em>PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.</em></strong></p><p>Originally posted on <a href="https://himalayas.app">Himalayas</a></p>

Apply with uptayn.

Sign in free to open the apply link, get this role scored against your CV, and track your application.

uptayn
2026 · built quietly in Berlin.
uptayn = up + attain
Built for
  • Recent business grads
  • Engineers pivoting to ops
  • Consultants → startup
  • Second-job operators
Quiet by default
  • No tracking pixels
  • No LinkedIn login
  • No spam outreach
  • Just roles + your CV