publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2024
- arXivIncreased LLM Vulnerabilities from Fine-tuning and QuantizationarXiv, Apr 2024
- NeurIPS WorkshopInvestigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMsIn Neurips Safe Generative AI Workshop 2024 , Oct 2024
- NeurIPS WorkshopSAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red TeamingIn Red Teaming GenAI: What Can We Learn from Adversaries? , Oct 2024
- NeurIPS WorkshopEfficacy of the SAGE-RT Dataset for Model Safety Alignment: A Comparative StudyIn Pluralistic Alignment Workshop at NeurIPS 2024 , Oct 2024
- arXivVERA: Validation and Enhancement for Retrieval Augmented systemsSep 2024