Homework-4
.pdf
keyboard_arrow_up
School
Carnegie Mellon University *
*We aren’t endorsed by this school
Course
10315
Subject
Computer Science
Date
Apr 3, 2024
Type
Pages
16
Uploaded by ProfessorCrab6037
H
OMEWORK
4
M
ULTI
-M
ODAL
F
OUNDATION
M
ODELS
*
10-423/10-623 G
ENERATIVE
AI
http://423.mlcourse.org
OUT: Mar. 13, 2024
DUE: Mar. 22, 2024
TAs: Asmita, Haoyang, Tiancheng
Instructions
•
Collaboration Policy
: Please read the collaboration policy in the syllabus.
•
Late Submission Policy:
See the late submission policy in the syllabus.
•
Submitting your work:
You will use Gradescope to submit answers to all questions and code.
– Written:
You will submit your completed homework as a PDF to Gradescope. Please use the
provided template. Submissions can be handwritten, but must be clearly legible; otherwise, you
will not be awarded marks. Alternatively, submissions can be written in L
A
T
E
X. Each answer
should be within the box provided. If you do not follow the template, your assignment may
not be graded correctly by our AI assisted grader and there will be a
2% penalty
(e.g., if the
homework is out of 100 points, 2 points will be deducted from your final score).
– Programming:
You will submit your code for programming questions to Gradescope. There is
no autograder. We will examine your code by hand and may award marks for its submission.
•
Materials:
The data that you will need in order to complete this assignment is posted along with the
writeup and template on the course website.
Question
Points
Instruction Fine-Tuning & RLHF
9
Latent Diffusion Model (LDM)
6
Programming: Prompt2Prompt
31
Code Upload
0
Collaboration Questions
2
Total:
48
*
Compiled on Wednesday 13
th
March, 2024 at 17:17
1
Homework 4: Multi-Modal Foundation Models
10-423/10-623
1
Instruction Fine-Tuning & RLHF (9 points)
1.1. (6 points)
Short answer:
Highlight the differences between in-context learning, unsupervised
pre-training, supervised fine-tuning, and instruction fine-tuning by defining each one. Assume we
are interested specifically in autoregressive large language models (LLMs) over text. Each defi-
nition must mention properties of the training examples and how they are used, and how learning
affects the parameters of the model.
Definition: in-context learning
Definition: unsupervised pre-training
Definition: supervised fine-tuning
Definition: instruction fine-tuning
2 of 16
Homework 4: Multi-Modal Foundation Models
10-423/10-623
1.2. (3 points)
Ordering:
Consider a correctly defined reinforcement learning with human feedback
(RLHF) pipeline.
Select the correct ordering of the items below to define such a pipeline by num-
bering them from 1 to
N
. If two items can occur simultaneously, number them identically. To
exclude an item from the ordering, number it as
0
.
•
Repeat the previous step many times.
•
Repeat the following steps many times.
•
From human labelers, collect rankings of samples from the language model.
•
Collect instruction fine-tuning training examples from human labelers.
•
Take a (stochastic) gradient step for a reinforcement learning objective.
•
Sample a prompt/response pair from the language model.
•
Collect prompt/response/reward tuples from human labelers.
•
Perform supervised fine-tuning of the language model.
•
Query the regression model for its score of an input.
•
Perform supervised training of the regression model.
•
Pre-train the language model.
3 of 16
Homework 4: Multi-Modal Foundation Models
10-423/10-623
2
Latent Diffusion Model (LDM) (6 points)
2.1. (2 points)
Short answer:
Why does a latent diffusion model run diffusion in a latent space instead
of pixel space?
2.2.
Short answer:
Standard cross-attention for a diffusion-based text-to-image model defines the
queries
Q
as a function of the pixels (or latent space)
Y
∈
R
m
×
d
y
, and the keys
K
and values
V
as a function of the text encoder output
X
∈
R
n
×
d
x
.
Q
=
YW
q
,
K
=
XW
k
,
V
=
XW
v
(where
W
q
∈
R
d
y
×
d
and
W
k
,
W
v
∈
R
d
x
×
d
) and then applies standard attention:
Attention
(
Q
,
K
,
V
) =
softmax
(
QK
T
/
√
d
)
V
Now, suppose you instead defined a new formulation where the values are a function of the pixels
(or latent space):
V
=
YW
v
where
W
v
∈
R
d
y
×
d
.
2.2.a. (2 points) What goes wrong mathematically in the new formulation?
2.2.b. (2 points) Intuitively, why doesn’t the new formulation make sense? Briefly begin with an
explanation of what the original formulation of cross-attention is trying to accomplish for a
single query vector, and why this new formulation fails to accomplish that.
4 of 16
Homework 4: Multi-Modal Foundation Models
10-423/10-623
3
Programming: Prompt2Prompt (31 points)
Introduction
In this section, we explore an innovative approach to image editing. Editing techniques aim to retain the
majority of the original image’s content while making certain changes. However, current text-to-image
models often produce completely different images when only a minor change to the prompt is made.
State-of-the-art methods typically require a spatial mask to indicate the modification area, which ignores
the original image’s structure and content in that region, resulting in significant information loss.
In contrast, the
Prompt2Prompt framework by Hertz et al.
(2022)
facilitates edits using only text,
striving to preserve original image elements while allowing for changes in specific areas.
Cross-attention maps, which are high-dimensional tensors binding pixels with prompt text tokens, hold
rich semantic relationships crucial to image generation. The key idea is to edit the image by injecting
these maps into the diffusion process. This method controls which pixels relate to which particular
prompt text tokens throughout the diffusion steps, allowing for targeted image modifications.
You’ll explore modifying token values to change scene elements (e.g. a ”dog” riding a bicycle
→
a
”cat” riding a bicycle) while maintaining the original cross-attention maps to keep the scene’s layout
intact.
HuggingFace Diffusers
In this assignment, we will be using
HuggingFace’s diffusers
, a library created for easily using well-
known state-of-the-art diffusion models, including creating the model classes, loading pre-trained
weights, and calling specific parts of the models for inference. Specifically, we will be using the API
for the class
DiffusionPipeline
and methods from its subclass
StableDiffusionPipeline
for loading the pre-trained LDM model.
You are required to read the API for StableDiffusionPipeline:
https://huggingface.co/docs/diffusers/en/api/pipelines/stable_
diffusion/text2img
You will be implementing the model loading and calling individual components of StableDiffusion-
Pipeline in this assignment.
Starter Code
The files are organized as follows:
hw4/
run_in_colab.ipynb
prompt2prompt.py
ptp_utils.py
seq_aligner.py
requirements.txt
Here is what you will find in each file:
1.
run_in_colab.ipynb
: This is where you can run inference and see the visualization of your
implemented methods.
5 of 16
Homework 4: Multi-Modal Foundation Models
10-423/10-623
2.
prompt2prompt.py
: Contains the
text2image_ldm(...)
method that generates images
from text prompts by controlling the diffusion process with attention mechanisms in Hugging-
Face’s latent diffusion model, and contains the
AttentionReplace
class. The class contains
the forward process and methods to replace attention. You will implement all these. (Note: Loca-
tions in the code where changes ought to be made are marked with a TODO.)
3.
ptp_utils.py
: Contains a set of helper functions that will be useful to you for filling in the
text2image_ldm(...)
method.
Carefully read through the file to understand what these
functions are.
4.
seq_aligner.py
:
Contains
a
set
of
helper
functions
that
are
used
to
ini-
tialize
AttentionReplace
’s
class
variables.
You
will
need
to
implement
get_replacement_mapper_(...)
(Note: Locations in the code where changes ought to
be made are marked with a TODO.)
5.
requirements.txt
: A list of packages that need to be installed for this homework.
Command Line
We recommend conducting your final experiments for this homework on Colab. Colab provides a free
T4 GPU for code execution.
(Run the run_in_colab.ipynb for visualization.)
You may find it easier to implement/debug locally. We have also included a very simple example of
visualization that you can run on the command line:
python prompt2prompt.py
Prompt2Prompt
In this problem, you will implement Prompt2Prompt in the file
prompt2prompt.py
.
Figure 1: Visual and textual embedding are fused using cross-attention layers that produce attention maps
for each textual token. Figure source:
Hertz et al. (2022)
Latent Diffusion Model:
You will implement the
text2image_ldm
method. In that method, we provided some suggested
structure by giving you the left-hand side of the initializations.
Implementing this method requires you to have already read the HuggingFace Diffusers API.
See
above.
You
will
be
working
with
the
DiffusionPipeline
type,
but
the
line
6 of 16
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Normalize the table based on the ER-Diagram developed.
Try to normalize the ER-Diagram to 3NF.
We recommend:
LucidChart
Draw.io
arrow_forward
%3D595629&cmid3D7196&page%3D1
rary
HelpDesk+
DANIELA BONITE
D-CSS2121 Computer Systems Servicing
II Part 2
My courses / BAED-CSS2121-2112S / WEEK 2-3: APPLYING QUALITY STANDARDS (AQS)
prmance Task 1
en 2
The most powerful computer in terms of performance and data processing are
et
the
ered
ed out of
Select one:
a.
Minicomputer
ag question
Ob.
microcomputer
Oc.
Mainframe Computer
Od. Super Computer
Next page
Previous page
8:32 PM
arrow_forward
Match the scenarios given in the left-hand column with the type of decomposition indicated in the right-hand
column.
Arya is responsible for buying groceries and Asia is
V [Choose]
responsible for mowing the lawn
Thread decomposition
Task flow decomposition
Weijun is responsible for cutting half of the
vegetables and Zhihui is responsible for cutting the
Data decomposition
Data flow decomposition
other half
Task decomposition
Jose is responsible for writing code and Yasmine is
[ Choose ]
responsible for testing the code
arrow_forward
Can we have identifying relation between two strong entities in this digram in the attached picture. ?
arrow_forward
Case Study 1:
Nancy is leading a Scrum project for her organization. The project is to create new software for the Accounting Department. She is meeting with Tom, the director for the accounting department and the project team members to discuss the requirements of the project. Tom, Nancy, and the project team have identified all the requirements that Tom would like the app to have, but now Nancy wants to organize the list of requirements in a prioritized view.
Based on this scenario, what role is Nancy?
What role does Tom play?
What can the scrum team do to help Nancy and Tom at this point of the project?
What is the list of requirements called?
Who should be prioritizing the list of requirements in this scenario?
arrow_forward
convert to ER with show relationships
arrow_forward
DRAW E-R DiagramYou are required to create a course content management software for an educational institution.Below are the main features of the system:● Admins should be able to create accounts for new users.● Users should be able to login.● Accounts can be of type Lecturer/Course Maintainer, Student, Admin● Admins should be able to create a Course.● A student can be assigned to courses.● A Course should consist of Members(Students. Lecturer/Course Maintainer)● A Course should consist of several Discussion Forums.● A Discussion Forum should consist of several discussion threads.● Students and Course Maintainers should be able to create or reply to discussion threads.● Students should be able to reply to replies thereby creating another thread● A Course should have Calendar Events e.g Assignment Due Date.● The Course Maintainer should have the ability to create, remove and update calendar events.● The content of a course should consist of sections/topics.● Each section should…
arrow_forward
Write a research project on Implementing an M-Commerce Solution to Enhance Customer Experience in Retail Stores.
The project should minimally contain the following sections:
Table of contents
Astract
Business and Market Overview: This section must include an illustration Porter’s Model with the unique and specific pressures that the client faces.
Problem Definition: Explain the current, or 'AS-IS', state process and illustrate it with a UML Activity Diagram with partitions that the customer leverages that could be improved with the use of technologies.
Business Solution Proposal: taking the 'AS-IS' model from section 2, elaborate on how technology will be utilized to streamline or optimize the current process in order to solve the stated business problem. Explain the future, or ‘ 'TO-BE', state process and illustrate it with a UML Activity Diagram.
Solution Assessment: Leverage the output from the previous three sections and quantify the value of the business solution. Identify and…
arrow_forward
CakePhP is a high-level PhP-based framework for developing Web applications.
1)List two of the features that CakePhP provides that make it easier to write web applications than using lower level modules such as cgi and Cookie.
(2) CakePhP includes an Object Relational Mapper (ORM) module. Describe what an ORM is and the advantages of using one. Show with simple code examples how models are defined with the ORM and how data can be retrieved from the database. (Syntax details are not important, just the main features of how this works.)(3)How does CakePhP manage users and user authorization? For example, how would you ensure that only a logged in user could access a given page? Does CakePhp use cookies to manage user sessions or does it have some other mechanism?
arrow_forward
Enhanced entity-relationship (EER) Diagram of Library Management System
Please explain the flow of this EER Diagram in words or any way that you want to explain. Thank you so much
arrow_forward
Design ER diagram (database) for blog
aplication.
note: not html or php
ER diagram (database) must contain
User view:
Minimum requirements:
Sign-up page
Sign-in page
List of all posts shared by all users
Add a new post
List of all own posts
Edit own posts
Delete own posts
Admin view:
i. Minimum requirements:
Sign-in
List of all posts shared by all users
List of waiting posts for approval
Approve/reject a post
arrow_forward
The CIS Department at Tiny College maintains the Free Access to Current Technology (FACT) library of ebooks. FACT is a collection of current technology ebooks for use by faculty and students. Agreements with the publishers allow patrons to electronically check out a book, which gives them exclusive access to the book online through the FACT website, but only one patron at a time can have access to a book. A book must have at least one author but can have many.
An author must have written at least one book to be included in the system but may have written many. A book may have never been checked out but can be checked out many times by the same patron or different patrons over time. Because all faculty and staff in the department are given accounts at the online library, a patron may have never checked out a book or they may have checked out many books over time. To simplify determining which patron currently has a given book checked out, a redundant relationship between BOOK and PATRON…
arrow_forward
1. Consider the following schema:
Player
Sport
spld
Record
plvId plyName
Hilal
Tarek
#evCode place
spName
Football
#plyId
1
1
2
2
Tennis
1
El
2
1
Еб
4
Sultan
Salim
Swimming
Cycling
3
3
4
4
E2
1
2
E5
1
3
ЕЗ
32
Event
evCode
3
E8
#spId evPlayes
4
75
evTitle
E4
3
Mini Foot Fest
Ibri Tour
Oman Open
Arab Sea Race
Muscat Cycling
E1
1
4
E7
1
E2
15
ЕЗ
32
E4
3
36
E5
4
48
E6
Dhofar Fest
1
115
Indoors Cup
Suhar Open
E7
3
32
B8
2
64
Note: primary keys are underlined, foreign keys have #.
Answer the following questions : (after executing the SQL command, copy the command and screen shot of the
result in a word file. Do not forget to indicate the question nu
and send it by email)
er, save the file with your student number
Q4. Add attribute şspld to the table Player.
Q5. Update the SRIA of player 3 to 2.
arrow_forward
Exercise 1. Normalize following unnormalized relation
PROJECT PRCHCY
NUMBER NAME
Evergreen
EMPLOYEE
NUMBER
CHARGE
HOUR
HOURS
ILLED
EMPLOE
TOTAL
NAME
June E. Arbough
John G. News
Alice K. kohnson
William Smithfield
David H. Senior
CLASS
Elec. Engineer
Database Designer
Database Designer
$ 2,034.90
$ 2,037.00
$1,748.50
$ 450.45
15
103
$85.50
230
101
$105.00
19.4
105
$105.00
35.7
$ 35.75
$ 6.75
106
126
102
Systems Analyst
Subtotal
Applications Designer
General Support
Systems Analyst
DSS Analyst
218
$2.302.65
S10,573.50
14
$ 48.10
$ 18.36
$ 9675
$ 45.95
$ 1,183.26
$ 831.71
53,134.70
$2.067.75
$ 7.265.52
$ 6,998.50
$4,682.70
$ 1,135.16
S S91.14
$ 457.60
$1,765.10
$915.20
$4,431.15
$ 5,911.50
$ 1,592.11
$ 2,203.30
$ 559.98
$1902.33
$17,595.57
18
Amber Wave
Annelne Jones
James Frommer
Anne K. Ramoras
Darlene M. Smithon
25.6
118
45.3
104
324
112
45.0
Subtotal
$105.00
$6.75
$ 48.10
Roling Tide
Database Designer
Systems Analyst
Applications Deigner
Clerical Support
Programmer
Subtotal
22
105…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education
Related Questions
- Normalize the table based on the ER-Diagram developed. Try to normalize the ER-Diagram to 3NF. We recommend: LucidChart Draw.ioarrow_forward%3D595629&cmid3D7196&page%3D1 rary HelpDesk+ DANIELA BONITE D-CSS2121 Computer Systems Servicing II Part 2 My courses / BAED-CSS2121-2112S / WEEK 2-3: APPLYING QUALITY STANDARDS (AQS) prmance Task 1 en 2 The most powerful computer in terms of performance and data processing are et the ered ed out of Select one: a. Minicomputer ag question Ob. microcomputer Oc. Mainframe Computer Od. Super Computer Next page Previous page 8:32 PMarrow_forwardMatch the scenarios given in the left-hand column with the type of decomposition indicated in the right-hand column. Arya is responsible for buying groceries and Asia is V [Choose] responsible for mowing the lawn Thread decomposition Task flow decomposition Weijun is responsible for cutting half of the vegetables and Zhihui is responsible for cutting the Data decomposition Data flow decomposition other half Task decomposition Jose is responsible for writing code and Yasmine is [ Choose ] responsible for testing the codearrow_forward
- Can we have identifying relation between two strong entities in this digram in the attached picture. ?arrow_forwardCase Study 1: Nancy is leading a Scrum project for her organization. The project is to create new software for the Accounting Department. She is meeting with Tom, the director for the accounting department and the project team members to discuss the requirements of the project. Tom, Nancy, and the project team have identified all the requirements that Tom would like the app to have, but now Nancy wants to organize the list of requirements in a prioritized view. Based on this scenario, what role is Nancy? What role does Tom play? What can the scrum team do to help Nancy and Tom at this point of the project? What is the list of requirements called? Who should be prioritizing the list of requirements in this scenario?arrow_forwardconvert to ER with show relationshipsarrow_forward
- DRAW E-R DiagramYou are required to create a course content management software for an educational institution.Below are the main features of the system:● Admins should be able to create accounts for new users.● Users should be able to login.● Accounts can be of type Lecturer/Course Maintainer, Student, Admin● Admins should be able to create a Course.● A student can be assigned to courses.● A Course should consist of Members(Students. Lecturer/Course Maintainer)● A Course should consist of several Discussion Forums.● A Discussion Forum should consist of several discussion threads.● Students and Course Maintainers should be able to create or reply to discussion threads.● Students should be able to reply to replies thereby creating another thread● A Course should have Calendar Events e.g Assignment Due Date.● The Course Maintainer should have the ability to create, remove and update calendar events.● The content of a course should consist of sections/topics.● Each section should…arrow_forwardWrite a research project on Implementing an M-Commerce Solution to Enhance Customer Experience in Retail Stores. The project should minimally contain the following sections: Table of contents Astract Business and Market Overview: This section must include an illustration Porter’s Model with the unique and specific pressures that the client faces. Problem Definition: Explain the current, or 'AS-IS', state process and illustrate it with a UML Activity Diagram with partitions that the customer leverages that could be improved with the use of technologies. Business Solution Proposal: taking the 'AS-IS' model from section 2, elaborate on how technology will be utilized to streamline or optimize the current process in order to solve the stated business problem. Explain the future, or ‘ 'TO-BE', state process and illustrate it with a UML Activity Diagram. Solution Assessment: Leverage the output from the previous three sections and quantify the value of the business solution. Identify and…arrow_forwardCakePhP is a high-level PhP-based framework for developing Web applications. 1)List two of the features that CakePhP provides that make it easier to write web applications than using lower level modules such as cgi and Cookie. (2) CakePhP includes an Object Relational Mapper (ORM) module. Describe what an ORM is and the advantages of using one. Show with simple code examples how models are defined with the ORM and how data can be retrieved from the database. (Syntax details are not important, just the main features of how this works.)(3)How does CakePhP manage users and user authorization? For example, how would you ensure that only a logged in user could access a given page? Does CakePhp use cookies to manage user sessions or does it have some other mechanism?arrow_forward
- Enhanced entity-relationship (EER) Diagram of Library Management System Please explain the flow of this EER Diagram in words or any way that you want to explain. Thank you so mucharrow_forwardDesign ER diagram (database) for blog aplication. note: not html or php ER diagram (database) must contain User view: Minimum requirements: Sign-up page Sign-in page List of all posts shared by all users Add a new post List of all own posts Edit own posts Delete own posts Admin view: i. Minimum requirements: Sign-in List of all posts shared by all users List of waiting posts for approval Approve/reject a postarrow_forwardThe CIS Department at Tiny College maintains the Free Access to Current Technology (FACT) library of ebooks. FACT is a collection of current technology ebooks for use by faculty and students. Agreements with the publishers allow patrons to electronically check out a book, which gives them exclusive access to the book online through the FACT website, but only one patron at a time can have access to a book. A book must have at least one author but can have many. An author must have written at least one book to be included in the system but may have written many. A book may have never been checked out but can be checked out many times by the same patron or different patrons over time. Because all faculty and staff in the department are given accounts at the online library, a patron may have never checked out a book or they may have checked out many books over time. To simplify determining which patron currently has a given book checked out, a redundant relationship between BOOK and PATRON…arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education