CS 690: Trustworthy and Responsible AI

Instructor: Eugene Bagdasaryan
TA: June Jeong
Time: MoWe 2:30PM - 3:45PM
Location: Computer Science Bldg rm 142
Office hours: Eugene: Wed 12-1pm by appointment , CS 304 | June: Fri 2-4pm by appointment , CS 207

In the era of intelligent assistants, autonomous agents, and self-driving cars we expect AI systems to not cause harm and withstand adversarial attacks. In this course you will learn advanced methods of building AI models and systems that mitigate privacy, security, societal, and environmental risks. We will go deep into attack vectors and what type of guarantees current research can and cannot provide for modern generative models. The course will feature extensive hands-on experience with model training and regular discussion of key research papers. Students are required to have taken NLP, general ML, and security classes before taking this course.

Expectations

  • Required reading, attendance, and participation
  • Each group: weekly presentation + code for assignments
  • Group research project

Grading Breakdown

Component Percentage Details
Attendance 10% Allowed to miss any 4 classes
Assignments (slides + report + code) 40% 2 total (20% each), allowed 1 late day per assignment.
Final Project 50%
  • 1-page Proposal (10%)
  • Mid-Semester Presentation (10%)
  • Final Report (20%)
  • Final Presentation (10%)
(Optional) bonus up to 5% Active participation, excellent code implementation, slide efforts

Syllabus: Weekly Schedule

Fall 2025 Class Schedule
Week Class # Date Topic Notes Links/Slides/Assignments
Week 11Wed, Sep 3Intro + Project group formationsFirst Day of ClassesBonus Assignment: Startup ideas
No reading
Week 22Mon, Sep 8Overview Privacy and SecuritySlidesAssignment 1 (Due 9/19)
πŸ“– Paper 1: Towards the Science of Security and Privacy in Machine Learning
3Wed, Sep 10Privacy. Membership Inference AttacksSlides
πŸ“– Paper 1: Membership Inference Attacks From First Principles
πŸ“– Paper 2: Membership Inference Attacks against Machine Learning Models
Week 34Mon, Sep 15Privacy. Training Data AttacksLast day to add/drop (Grad)
πŸ“– Paper 1: Extracting Training Data from Large Language Models
πŸ“– Paper 2: Language Models May Verbatim Complete Text They Were Not Explicitly Trained On
πŸ“– Paper 3: Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
πŸ“– Paper 4: Imitation Attacks and Defenses for Black-box Machine Translation Systems
5Wed, Sep 17Privacy. Federated Learning
πŸ“– Paper 1: Communication-Efficient Learning of Deep Networks from Decentralized Data
πŸ“– Paper 2: Advances and Open Problems in Federated Learning
Fri,
Sep 19
Assignment 1:
Synthetic data + reconstruction.
Week 46Mon, Sep 22Privacy. Differential Privacy, Part 1. Basics
πŸ“– Paper 1: Deep Learning with Differential Privacy
7Wed, Sep 24Privacy. Differential Privacy, Part 2. In-Context Learning, Private Evolution
πŸ“– Paper 1: Differentially Private Synthetic Data via Foundation Model APIs 1: Images
Fri,
Sep 26
Assignment 2:
Federated Learning.
Week 58Mon, Sep 29Privacy. Data Analytics, PII Filtering with LLMs
πŸ“– Paper 1: Beyond Memorization: Violating Privacy Via Inference with Large Language Models
9Wed, Oct 1Privacy. Contextual Integrity
πŸ“– Paper 1: AirGapAgent: Protecting Privacy-Conscious Conversational Agents
Fri,
Oct 3
Assignment 3. Due:
Differential Privacy.
Week 610Mon, Oct 6Privacy. Student Panel + Future Directions Discussions
No paper reading
11Wed, Oct 8Project Presentations. Part 1.
No paper reading
Fri,
Oct 10
Assignment 4 Due:
PII Extraction
Week 712Mon, Oct 13Holiday - Indigenous People's Day (No Class)
13Wed, Oct 15No class. Project work
No paper reading
Fri,
Oct 17
Assignment 5 Due:
Contextual Integrity.
Week 814Mon, Oct 20Security. Jailbreaks
πŸ“– Paper 1: Universal and Transferable Adversarial Attacks on Aligned Language Models
15Wed, Oct 22Security. Prompt injections
πŸ“– Paper 1: Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Assignment 6: Jailbreaking and prompt injections
Week 916Mon, Oct 27Security. Adversarial Examples in Multi-modal systems
πŸ“– Paper 1: Are aligned neural networks adversarially aligned?
πŸ“– Paper 2: Self-interpreting Adversarial Images
Assignment 7: Multi-modal attacks
17Wed, Oct 29Security. Poisoning and Backdoors
πŸ“– Paper 1: How To Backdoor Federated Learning
Assignment 8: Backdoors
Week 1018Mon, Nov 3Security. Watermarks
πŸ“– Paper 1: SoK: Watermarking for AI-Generated Content
19Wed, Nov 5Security. Alignment Attacks
πŸ“– Paper 1: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Assignment 9: Alignment attacks + RLHF
Week 1120Mon, Nov 10Security. Principled Defenses
πŸ“– Paper 1: Defeating Prompt Injections by Design
πŸ“– Paper 2: Contextual Agent Security
21Wed, Nov 12Security. Student Panel + Future Directions
No paper reading
Week 1222Mon, Nov 17Societal. Model Fairness and Biases
πŸ“– Paper 1: Differential Privacy Has Disparate Impact on Model Accuracy
23Wed, Nov 19Societal. Propaganda, Misinformation, and Deception
πŸ“– Paper 1: Propaganda-as-a-service
Week 1324Mon, Nov 24No class. Project work
No paper reading
25Tue, Nov 25Thanksgiving recess begins after last class
Week 1426Mon, Dec 1Environmental. Resource Overhead AttacksClasses resume
πŸ“– Paper 1: OverThink: Slowdown Attacks on Reasoning LLMs
Assignment 10: Resource overhead attacks
27Wed, Dec 3Final Project Presentations. Part 1.
No paper reading
Week 1528Mon, Dec 8Final Project Presentations. Part 2.
No paper reading

Group Project

Instructions for Your Project

You will design an AI Startup and try to defend it against attacks.

Pre-requisites:

  • You need to operate on customer data, i.e. users, companies
  • Your product should use LLMs
  • Your product operates with external parties, i.e. customers, vendors, etc.

Example projects:

  • Customer support bot
  • AI Tutor
  • Business assistant
  • ...

Throughout the semester you will add privacy and security features, building a comprehensive analysis of your project.

Additionally, you will need to pick one of the research topics you are interested in and write an extensive research report on it.

Make sure to track your own contributions through Git commits (both code and reports).

Assignments

All Assignments Overview

  1. Build synthetic data + Show attacks β†’ train + reconstruction (detailed below)
  2. Implement Federated Learning
  3. Implement Differential Privacy and Private Evolution
  4. PII filtering + CI β†’ airgap, context hijacking
  5. Jailbreaking and prompt injections
  6. Multi-modal attacks
  7. Backdoors and watermarking
  8. Alignment attacks + RLHF
  9. Resource overhead attacks

Assignment Process

  1. Create your repository: Each team must create a repository on GitHub.
  2. Share access: Add the teaching staff as collaborators.
  3. Roles:
    • One lead author writes the report and conducts the main experiments.
    • Other team members advise and consult.
    • The lead author receives 80% of the grade, other members receive 20%.

Structure of Each Assignment

  • Reading Report: Summarize, critique, and connect the assigned papers to your project.
    Include key discussion points from class as well.
  • Code: Implement the required attacks/defenses, include documentation and results.
  • Presentation: Prepare a short slide deck summarizing your findings.

Deadlines & Presentations

  • Due: Slide deck, reading report, and code are due Friday of that week, 11:59 PM EST.
  • Presentation: Happens at the beginning of the following week’s class.

(Bonus) Week 2 – Startup Ideas & Group Formation

Not graded Β· Bonus points for best ideas

Timeline

  • Week 2 class: Present your startup idea in-class (~5 minutes).
  • During class: Form project groups.

Content

  • Present your ideas for a startup for 5 minutes.
  • Show how the startup touches on user data and opens up privacy/security challenges.
  • Form groups.
  • Don’t forget about Slack! Join here

This assignment is not graded! But best ideas will get bonus points.

Week 3 β€” Build a Synthetic Dataset & MIA/Extraction Attack

Due Fri 9/19 Β· 11:59 PM EST

At a Glance

What to submit:

Timeline

  • Friday 11:59 PM EST: slides + reading report + code due
  • Next class: short in-class presentation

Reading Report

Summarize & critique: 2–3 sentence summary + 1 strength + 1 weakness.

Connect

  • How do MIAs exploit overfitting? How does this connect to your MIA implementation?
  • How does data extraction differ from MIAs, and why are LLMs vulnerable?
  • Is your dataset vulnerable to MIA? Why?
  • What is the difference between MIA and reconstruction attacks?

Discussion

  • What properties of training data make extraction more dangerous?

Coding Assignment 1

Goal: (1) Build a synthetic dataset and train a model. (2) Run a membership inference attack (image) or a training-data extraction demo (text). Bonus: do both.

Task 1. Build a Synthetic Dataset β€” Requirements
  • Create a realistic synthetic dataset. Programmatic or manual; small but coherent.
  • Any data type: tabular, text, image, audio, or video (the product should still use an LLM).
  • Size & labels:
    • β‰₯ 100 labeled samples.
    • Labels suitable for training/testing (class/category).
    • Manual sanity-check for coherence and correct labels.
  • Use-case representation: realistic privacy-sensitive scenario. Examples:
    • Tabular: customer transactions, demographics, sensors.
    • Text: user queries, chatbot logs, search queries.
    • Image: simple shapes, icons, handwritten digits.
    • Audio/Video: short labeled clips (optional).
Task 2. Attack β€” Requirements
  • Train a baseline model suitable for your data type.
  • Implement MIA or training-data extraction and report AUC/attack accuracy or a justified qualitative score.
  • Analyze why attack strength matches (or not) reconstruction quality.
Deliverables (GitHub)
  • Dataset file β€” CSV (tabular), JSON/TXT (text), or folder (images/audio)
  • Attack code
  • README with:
    • Dataset (use case, #samples & label distribution, generation method, examples)
    • Attack details (design choices, implementation, metrics & results, vulnerability analysis, implications)

Presentation

  • Upload: Add your slides to the shared class slide deck.
  • Summary: Summarize the assigned paper(s) β€” key contributions, methods, findings.
  • Connection: Relate to your startup/project (threat models, risks, defenses).
  • Dataset details: Briefly explain your synthetic dataset and attack setup.
  • Demo: Code demo: show results on your dataset (success rate, reconstruction quality, and why).

Course Policies

Course Grade Scale (100–499)

GradeRangeGradeRange
A95–100B83–86
A-90–94B-80–82
B+87–89C+77–79
C73–76C-70–72
D+67–69D63–66
F0–62

Note: If your course uses a total-points basis (e.g., 499 pts), the letter-grade cutoffs are applied to the percentage earned.

Notes on AI Use

You may use AI tools to help with reading or drafting, but you must fully understand the material and be able to explain it clearly to your teammates. The goal is to enrich group learning and class discussionβ€”not just to generate text. You need to provide the β€œHow I used AI” section in your report.

Late Policy

Each assignment includes one late day (24 hours) that may be used without penalty. Late days do not accumulate across assignmentsβ€”unused late days expire. Assignments submitted beyond the allowed late day will not be accepted unless prior arrangements are made due to documented emergencies.

Nondiscrimination Policy

This course is committed to fostering an inclusive and respectful learning environment. All students are welcome, regardless of age, background, citizenship, disability, education, ethnicity, family status, gender, gender identity or expression, national origin, language, military experience, political views, race, religion, sexual orientation, socioeconomic status, or work experience. Our collective learning benefits from the diversity of perspectives and experiences that students bring. Any language or behavior that demeans, excludes, or discriminates against members of any group is inconsistent with the mission of this course and will not be tolerated.

Students are encouraged to discuss this policy with the instructor or TAs, and anyone with concerns should feel comfortable reaching out.

Academic Integrity

All work in this course is designated as group work, with shared responsibility among members. While assignments will be submitted jointly and receive a group grade, each member is expected to contribute meaningfully and to track individual contributions within the group.

Collaboration within your group is encouraged and expected. You may discuss ideas, approaches, and strategies with others, but all written material, whether natural language or code, must be the original work of your group. Copying text or code from external sources without proper attribution is a violation of academic integrity.

This course follows the UMass Academic Honesty Policy and Procedures. Acts of academic dishonesty, including plagiarism, unauthorized use of external work, or misrepresentation of contributions, will not be tolerated and may result in serious sanctions.

If you are ever uncertain about what constitutes appropriate collaboration or attribution, please ask the instructor or TAs before submitting your work.

Accommodation Statement

The University of Massachusetts Amherst is committed to providing an equal educational opportunity for all students. If you have a documented physical, psychological, or learning disability on file with Disability Services (DS), you may be eligible for reasonable academic accommodations to help you succeed in this course. If you have a documented disability that requires an accommodation, please notify me within the first two weeks of the semester so that we may make appropriate arrangements. For further information, please visit Disability Services.

Title IX Statement (Non-Mandated Reporter Version)

In accordance with Title IX of the Education Amendments of 1972 that prohibits gender-based discrimination in educational settings that receive federal funds, the University of Massachusetts Amherst is committed to providing a safe learning environment for all students, free from all forms of discrimination, including sexual assault, sexual harassment, domestic violence, dating violence, stalking, and retaliation. This includes interactions in person or online through digital platforms and social media. Title IX also protects against discrimination on the basis of pregnancy, childbirth, false pregnancy, miscarriage, abortion, or related conditions, including recovery.

There are resources here on campus to support you. A summary of the available Title IX resources (confidential and non-confidential) can be found at the following link: https://www.umass.edu/titleix/resources. You do not need to make a formal report to access them. If you need immediate support, you are not alone. Free and confidential support is available 24/7/365 at the SASA Hotline 413-545-0800.