DLHW2_Q4
.pdf
keyboard_arrow_up
School
New York University *
*We aren’t endorsed by this school
Course
7123
Subject
Electrical Engineering
Date
Apr 3, 2024
Type
Pages
6
Uploaded by SuperHumanSteel12063
25/03/2024, 18:44 FashionMnist ViT.ipynb - Colaboratory Transformers in Computer Vision Transformer architectures owe their origins in natural language processing (NLP), and indeed form the core of the current state of the art models for most NLP applications. We will now see how to develop transformers for processing image data (and in fact, this line of deep learning research has been gaining a lot of attention in 2021). The Vision Transformer (ViT) introduced in this paper shows how standard transformer architectures can perform very well on image. The high level idea is to extract patches from images, treat them as tokens, and pass them through a sequence of transformer blocks before throwing on a couple of dense classification layers at the very end. Some caveats to keep in mind: ViT models are very cumbersome to train (since they involve a ton of parameters) so budget accordingly. ViT models are a bit hard to interpret (even more so than regular convnets). Finally, while in this notebook we will train a transformer from scratch, ViT models in practice are almost always pre-trained on some large dataset (such as ImageNet) before being transferred onto specific training datasets. v Setup As usual, we start with basic data loading and preprocessing. Ilpip install einops Requirement already satisfied: einops in /opt/conda/lib/python3.18/site-packages (0.7.0) import torch from torch import nn from torch import nn, einsum import torch.nn.functional as F from torch import optim from einops import rearrange, repeat from einops.layers.torch import Rearrange import numpy as np import torchvision import time torch.manual_seed(42) DOWNLOAD PATH = '/data/fashionmnist’ BATCH_SIZE TRAIN = 100 BATCH_SIZE_TEST = 1000 transform_fashionmnist = torchvision.transforms.Compose([torchvision.transforms.ToTensor(), torchvision.transforms.Normalize((©.5,), (8.5,))]) train_set = torchvision.datasets.FashionMNIST(DOWNLOAD PATH, train=True, download=True, transform=transform_fashionmnist) train_loader = torch.utils.data.DatalLoader(train_set, batch_size=BATCH_SIZE_TRAIN, shuffle=True) test_set = torchvision.datasets.FashionMNIST(DOWNLOAD PATH, train=False, download=True, transform=transform_fashionmnist) test loader = torch.utils.data.Dataloader(test _set, batch size=BATCH SIZE TEST, shuffle=True) v The ViT Model We will now set up the ViT model. There will be 3 parts to this model: » A patch embedding" layer that takes an image and tokenizes it. There is some amount of tensor algebra involved here (since we have to slice and dice the input appropriately), and the einops package is helpful. We will also add learnable positional encodings as parameters. * A sequence of transformer blocks. This will be a smaller scale replica of the original proposed ViT, except that we will only use 4 blocks in our model (instead of 32 in the actual ViT). » A (dense) classification layer at the end. Further, each transformer block consists of the following components: » A self-attention layer with H heads, » A one-hidden-layer (dense) network to collapse the various heads. For the hidden neurons, the original ViT used something called a GeLU activation function, which is a smooth approximation to the ReLU. For our example, regular ReLUs seem to be working just fine. The original ViT also used Dropout but we won't need it here. * |ayer normalization preceeding each of the above operations. Some care needs to be taken in making sure the various dimensions of the tensors are matched. https://colab.research.google.com/drive/126Qz6pSq6YfOUDfVuH8JnQrp100HdS1B#printMode=true 1/6
25/03/2024, 18:44 def pair(t): return t if isinstance(t, tuple) else (t, t) # classes class PreNorm(nn.Module): def def __init_ (self, dim, fn): super().__init_ () self.norm = nn.LayerNorm(dim) self.fn = fn forward(self, x, **kwargs): return self.fn(self.norm(x), **kwargs) class FeedForward(nn.Module): def def _init_ (self, dim, hidden_dim, dropout = 0.): super().__init_ () self.net = nn.Sequential( nn.Linear(dim, hidden_dim), nn.GELU(), nn.Dropout(dropout), nn.Linear(hidden_dim, dim), n.Dropout(dropout) > ) forward(self, x): return self.net(x) class Attention(nn.Module): def def __init__ (self, dim, heads = 4, dim_head = 64, dropout = @.1): super().__init_ () inner_dim = dim_head * heads project_out = not (heads == 1 and dim_head == dim) self.heads = heads self.scale = dim_head ** -0.5 self.attend = nn.Softmax(dim = -1) self.to_gkv = nn.Linear(dim, inner_dim * 3, bias = False) self.to_out = nn.Sequential( nn.Linear(inner_dim, dim), nn.Dropout(dropout) ) if project_out else nn.Identity() forward(self, x): b, n, _, h = *x.shape, self.heads gkv = self.to_gkv(x).chunk(3, dim = -1) q, k, v = map(lambda t: rearrange(t, 'b n (hd) ->b h nd', h =h), gkv) dots = einsum('b hid, bh jd->bhiij', q, k) * self.scale attn = self.attend(dots) out = einsum('b h i j, b h jd->bhid', attn, v) out = rearrange(out, '‘bhnd->bn (hd)") return self.to_out(out) class Transformer(nn.Module): def def __init_ (self, dim, depth, heads, dim_head, mlp_dim, dropout = 0.): super().__init_ () self.layers = nn.ModuleList([]) for _ in range(depth): self.layers.append(nn.ModuleList([ FashionMnist ViT.ipynb - Colaboratory PreNorm(dim, Attention(dim, heads = heads, dim_head = dim_head, dropout = dropout)), PreNorm(dim, FeedForward(dim, mlp_dim, dropout = dropout)) 1) forward(self, x): for attn, ff in self.layers: X = attn(x) + x X = FF(x) + x return x class ViT(nn.Module): def __init__ (self, *, image_size, patch_size, num_classes, dim, depth, heads, mlp_dim, pool = 'cls', channels = 3, dim_head = 64, dropout = 9.1, super().__init__ () image_height, image_width = pair(image_size) patch_height, patch_width = pair(patch_size) assert image_height % patch_height == @ and image_width % patch_width == @, num_patches = (image_height // patch_height) * (image_width // patch_width) patch_dim = channels * patch_height * patch_width 'Image dimensions must be divisible by the patch size.’ assert pool in {'cls', 'mean'}, 'pool type must be either cls (cls token) or mean (mean pooling)’ self.to_patch_embedding = nn.Sequential( Rearrange('b c¢ (h p1) (w p2) -> b (h w) (p1 p2 c)', pl = patch_height, p2 = patch_width), nn.Linear(patch_dim, dim), ) calf nne amhaddino = nn Paramatarf{+nrch randnf1 nim natrhace + 1 Aim\) https://colab.research.google.com/drive/126Qz6pSq6YTOUDNVuH8JnQrp100HdS1B#printMode=true 2/6
25/03/2024, 18:44 FashionMnist ViT.ipynb - Colaboratory BEA 1 PUS_CHUSUULIIE = e T @1 GIIS LS | LU Al e | @M 4y P LSS T 4y udmy ) self.cls_token = nn.Parameter(torch.randn(1, 1, dim)) self.dropout = nn.Dropout(emb_dropout) self.transformer = Transformer(dim, depth, heads, dim_head, mlp_dim, dropout) self.pool = pool self.to_latent = nn.Identity() self.mlp_head = nn.Sequential( nn.LayerNorm(dim), nn.Linear(dim, num_classes) ) def forward(self, img): x = self.to_patch_embedding(img) b, n, _ = x.shape cls_tokens = repeat(self.cls_token, ‘() nd -> bnd', b =b) x = torch.cat((cls_tokens, x), dim=1) x += self.pos_embedding[:, :(n + 1)] X = self.dropout(x) x = self.transformer(x) x = X.mean(dim = 1) if self.pool == 'mean' else x[:, @] x = self.to_latent(x) return self.mlp_head(x) model = ViT(image_size=28, patch_size=4, num_classes=10, channels=1, dim=64, depth=6, heads=4, mlp_dim=256) optimizer = optim.Adam(model.parameters(), 1lr=0.002) Let's see how the model looks like. model ViT( (to_patch_embedding): Sequential( (@): Rearrange('b c (h p1) (w p2) -> b (h w) (p1 p2 c)', pl=4, p2=4) (1): Linear(in_features=16, out_features=64, bias=True) ) (dropout): Dropout(p=0.1, inplace=False) (transformer): Transformer( (layers): Modulelist( (0-5): 6 x ModuleList( (@): PreNorm( (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (fn): Attention( (attend): Softmax(dim=-1) (to_gkv): Linear(in_features=64, out_features=768, bias=False) (to_out): Sequential( (@): Linear(in_features=256, out_features=64, bias=True) (1): Dropout(p=0.1, inplace=False) ) ) (1): PreNorm( (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (fn): FeedForward( (net): Sequential( (@): Linear(in_features=64, out_features=256, bias=True) (1): GELU(approximate="none') (2): Dropout(p=0.1, inplace=False) (3): Linear(in_features=256, out_features=64, bias=True) (4): Dropout(p=0.1, inplace=False) ) ) ) ) ) ) (to_latent): Identity() (mlp_head): Sequential( (@): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (1): Linear(in_features=64, out_features=10, bias=True) ) ) This is it - 4 transformer blocks, followed by a linear classification layer. Let us quickly see how many trainable parameters are present in this model. def count_parameters(model): return sum(p.numel() for p in model.parameters() if p.requires_grad) print(count_parameters(model)) https://colab.research.google.com/drive/126Qz6pSq6YTOUDNVuUH8JnQrp100HdS1B#printMode=true 3/6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
A microprocessor's bus is a conductor path in the device's architecture. This indicates the possibility for a current to flow between the different components. Provide a clear categorization of these buses and an in-depth explanation of their usefulness in microprocessors. Where do data and instructions differ most significantly when discussing a microprocessor?
arrow_forward
Write examples of random and deterministic signals in physiological systems.
arrow_forward
solve completely plz
arrow_forward
The base bias is commonly used in linear operation for its simplicity and B-independent feature.
True
False
arrow_forward
subject electrical engineering
arrow_forward
5- What is the effect of increasing k in First Order Systems Analysis on time constant & error ? why?
arrow_forward
Name the performance specification for first order systems
arrow_forward
Instrumentation& Measurement
arrow_forward
. The objective of using batteries is to.........
energy when the wind stops. (supply or consume)
(mechanical energy into electrical
The alternator converts........ of the prime mover.............
energy or electrical energy into mechanical energy)
Nuclear power station can be produced a
amount of electrical energy (large or small)
(charging
For the transmission line parameters, the shunt capacitance is the responsible of..........
currents or discharging currents)
In short transmission line, the conductor can carry up to
The other name of geometric mean radius is
The electric field lines originate from.......
(20 kV or 30 kV)
.....(self-GMD or self-GMR)
(positive charge or negative charge)
The effect of the capacitance is proportional to the length of transmission line (T or F)
The symbol & represents the
(actual permittivity of material or permittivity of free space)
arrow_forward
Controls Systems and Automation
Describing Functions
Q: Describe in detail, using diagrams, how the relay tuning method is applied in practice.
The describing function for a relay is given by N(a)=4M/πa. Explain each term and describe (donot derive) how this equation is developed.
arrow_forward
a) Write down the standard SOP of given Function F(x.y.z).
(See picture)
arrow_forward
What are linear and nonlinear data sets?
arrow_forward
State true or false only
h- In a probabilistic load flow study, the load is assumed to be a random variable.
i- The objective of a tap changer in a transformer is to control the active power flow.
j- A slack bus in a load flow study is a bus at which P & 8 are specified, but Q & IVI are unknown variables.
arrow_forward
4- What are the basic elements to implement a
system? Given example in terms of hardware
elements. Which element is most critical from
implementation point of view?
5- What is the difference between Direct Form I
and Direct Form Il representations of a
difference equation? What is the significance
of a Direct Form I representation?
arrow_forward
Electrical Engineering
Compare Eucidean distance and cosine similarity. When to use them for K-means clustering? How K-means clustering achieves clustering result by using them?
arrow_forward
please write your answers legibly
arrow_forward
Pair of
nel
Ql what are
tha quantiter of at
load-bus and
Slack bus
toad
flow studies?
arrow_forward
Complete the phrase! The error of a measurement system depends on the non-ideal characteristics of every element in the system. Using calibration techniques we can identify which element in the system have the most dominant non-ideal behavior. We can than devise compensation strategies for t6hese elements which should produce significant reductions in the overall system error. The methods are named .....
arrow_forward
An asynchronous circuit has two SR latches (not edge-triggered FF) denoted SR1 and SR2, and two external inputs x1 and x2. It also has a single output z. The excitation expressions for the inputs and output equations are (see image).
Using the characteristic equation for an SR latch, draw the excitation table. Then draw the state table, flow table, and flow diagram.
arrow_forward
Lab Tasks
Q1:
Consider system and obtain the unit step response and also analyze the response by implementing the
PID controller by Ziegler-Nichols method?
G(s) = 64/ [ s'+ 14s' + 56s + 64 ]
Q2:
Consider system and obtain the unit step response and also analyze the response by implementing the
PID controller by Ziegler-Nichols method?
G(s) = 1/ (s + 1)
arrow_forward
the produst in an industry bazsd on ths perfsstion of solor,
An automated QA mashine zelssta
weight, and length which are expressed by the variables C, W, and L, respectively. If a feature is
perfect the machine takes input as 1', 0' otherwise. Based on the conditions, machine decides
whether to reject (R) a product or not. Assume R = 1 if two (or more) of the features are not
perfect. Otherwise, R = 0.
(5-a) Apply your knowledge of Boolean logic to form the truth tablr
(5-b) Use K-map to find the formula for the above scenario.
(5-c) Draw the circuit diagram for the given scenario using minimum number of practical gates.
(5-d) Draw the circuit diagram for the given scenario using NAND gates only.
arrow_forward
1. Consider the function of 3 variables specified with the following sum of products (SOP) form.
F(X, Y, Z) = XY'Z'+ X'Y
a) Specify the function with a canonical SOP form. Use minterm numbers.
Minterm numbers can be identified using product terms presented with "0-1-dash" notation.
b) Specify the function with a canonical product of sums (POS) from. Use Maxterm numbers.
c) Minimize the function using K-map: obtain the minimal SOP and POS forms for the function.
arrow_forward
2a. Convert both combinational circuits in a.and b.to Booleans functions with applying K-map on extracted terms to get the optimized versions of both of them.
arrow_forward
Solve the hazard question below in the digital design
arrow_forward
Design a finite state machine (FSM) to control the traffic lights at a three-way intersection with a pedestrian crosswalk.
arrow_forward
Implement the following function F by using one 8-to-1 multiplexer.
F(A, B, C, D) = ∑m(2, 5, 6, 7, 13, 14, 15)
arrow_forward
Design a combinational circuit with the four inputs A,B.C, and D, and three outputs
X, Y, and Z. When the binary input is odd number, the binary output is one lesser
than the input. When the binary input is even number the binary output is one greate
than the input. Implement the function using multiplexers with minimal input and
select line.
arrow_forward
using 4x1 Mux
arrow_forward
I need the solution to chapter 10, #19
this is from Matlab a practical introduction to programming and problem solving
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Power System Analysis and Design (MindTap Course ...
Electrical Engineering
ISBN:9781305632134
Author:J. Duncan Glover, Thomas Overbye, Mulukutla S. Sarma
Publisher:Cengage Learning
Related Questions
- A microprocessor's bus is a conductor path in the device's architecture. This indicates the possibility for a current to flow between the different components. Provide a clear categorization of these buses and an in-depth explanation of their usefulness in microprocessors. Where do data and instructions differ most significantly when discussing a microprocessor?arrow_forwardWrite examples of random and deterministic signals in physiological systems.arrow_forwardsolve completely plzarrow_forward
- Name the performance specification for first order systemsarrow_forwardInstrumentation& Measurementarrow_forward. The objective of using batteries is to......... energy when the wind stops. (supply or consume) (mechanical energy into electrical The alternator converts........ of the prime mover............. energy or electrical energy into mechanical energy) Nuclear power station can be produced a amount of electrical energy (large or small) (charging For the transmission line parameters, the shunt capacitance is the responsible of.......... currents or discharging currents) In short transmission line, the conductor can carry up to The other name of geometric mean radius is The electric field lines originate from....... (20 kV or 30 kV) .....(self-GMD or self-GMR) (positive charge or negative charge) The effect of the capacitance is proportional to the length of transmission line (T or F) The symbol & represents the (actual permittivity of material or permittivity of free space)arrow_forward
- Controls Systems and Automation Describing Functions Q: Describe in detail, using diagrams, how the relay tuning method is applied in practice. The describing function for a relay is given by N(a)=4M/πa. Explain each term and describe (donot derive) how this equation is developed.arrow_forwarda) Write down the standard SOP of given Function F(x.y.z). (See picture)arrow_forwardWhat are linear and nonlinear data sets?arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Power System Analysis and Design (MindTap Course ...Electrical EngineeringISBN:9781305632134Author:J. Duncan Glover, Thomas Overbye, Mulukutla S. SarmaPublisher:Cengage Learning
Power System Analysis and Design (MindTap Course ...
Electrical Engineering
ISBN:9781305632134
Author:J. Duncan Glover, Thomas Overbye, Mulukutla S. Sarma
Publisher:Cengage Learning