Category: Transformer | Zhangzhe's Blog

0%

Transformer Category

2025

03-17

Transformers without Normalization

2024

07-30

KV Cache Transformer

07-18

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

07-17

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

07-12

GQA: Grouped-Query Attention

07-11

MQA: Multi-Query Attention

2023

07-29

MOTR: End-to-End Multiple-Object Tracking with Transformer

07-29

Deformable DETR: Deformable Transformers for End-to-end Object Detection

2022

09-29

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

01-20

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale