Giggso

Published on 06/03/2026
Ludhiana (041)
To be defined

Description:

Data Scientist

Design and implement end-to-end evaluation frameworks to assess performance, reliability, and safety of multi-agent AI systems
Lead experimentation and A/B testing efforts to systematically test hypotheses, validate model improvements, and track performance across agent iterations
Curate and maintain high-quality ground truth datasets to enable accurate, reproducible evaluation of multi-agent outputs
Identify and address reliability and accuracy gaps across agent workflows, failure modes, and edge cases in production-like environments
Stay current on emerging research in agentic AI, LLM evaluation, and multi-agent coordination to continuously improve framework design

Technical Skills

Proficiency in Python and ML frameworks
Hands-on experience with LLM APIs and agentic frameworks (LangChain, LlamaIndex, Semetic KernalI)
Familiarity with evaluation tooling (Ragas, DeepEval, LangSmith, or similar)
Experience with data pipelines, experiment tracking (MLflow, W&B), and CI/CD for ML workflows
Strong foundation in statistics, NLP, prompt engineering, experimental design, and A/B testing methodology
Proficiency in Azure ML, Azure OpenAI Service, and Azure AI Foundry for model deployment, evaluation, and orchestration
Familiarity with Azure Monitor and Application Insights for tracking reliability and performance of deployed agent systems

The processing of personal data received will be carried out in accordance with applicable laws, including the UK General Data Protection Regulation (UK GDPR) and the Data Protection Act 2018.