Link Search Menu Expand Document

4840-1054: Media Computing in Practice (Summer 2022)


  • July 6: Updated the invited talk.
  • April 21: Updated the Seminar for PR.
  • April 21: Updated the schedule and What should you do in each class.
  • April 14: Updated the schedule. Created the page of What should you do in each class?
  • April 14: If you’ve submitted a form of SageMaker Studio Lab but still haven’t authorized yet, please email to Takahiro Kubo (AWS) directory.
  • April 13: Week2 slide has been uploaded.
  • April 6: Students who wish to take the course must complete the survey on the ITC-LMS by 23:59 on April 6. As I have decided to give a lecture in Week 2, I extended the deadline for the survey until 23:59 on April 12.
  • April 6: Created a Q&A page.


This lecture will be given in seminar-style. Students will read, implement, and present a recent paper. The code must be publicly available as a GitHub repository. In addition, students will update other students’ repositories using Pull Requests.

  • Instructor: Yusuke Matsui.
  • Dates: Wednesday, 2nd period (10:25 - 12:10)
  • Location: Zoom (see ITC-LMS for the zoom URL)


  • Learn about recent research in the multimedia field (including but not limited to CV, NLP, ML, etc.)
  • Learn how to publish your research through the release of source code
  • Learn modern software development through the experience of modifying other students’ source code using Pull Request


  • GitHub account (mandatory)


  • Presentation, coding, and Pull Requests.
  • Depending on the number of participants, attendance may also be considered.


Date (2022) Contents Presented by Resources
Week 1, Apr 6 Guidance Yusuke Matsui [slide], [repo]
Week 2, Apr 13 Paper selection + PR + GitHub Actions Yusuke Matsui [slide]
Week 3, Apr 20 Seminar Yusuke Matsui [slide]
  Kim+, “Dynamic Closest Color Warping to Sort and Compare Palettes”, SIGGRAPH 2021 吴 宇涵 (Yuhan Wu) [slide], [repo]
  Huang+, “Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization”, ICCV 2017 YANG Chengkai [slide], [repo]
Week 4, Apr 27 Seminar    
  (1) Takahashi and Mitsufuji, “Densely connected multidilated convolutional networks for dense prediction tasks”, CVPR 2021 and (2) Défossez+, “Music Source Separation in the Waveform Domain”, arXiv 2021 naba89 [slide], [repo]
  Kipf+, “Neural Relational Inference for Interacting Systems”, ICML 2018 郑 书晗 (Zheng Shuhan) [slide], [repo]
Week 5, May 11 Seminar    
  Ronneberger+, “U-Net: Convolutional Networks for Biomedical Image Segmentation”, MICCAI 2015 朱 国豪 (Guohao (Trevor) Zhu) [slide], [repo]
  Ishida+, “Do We Need Zero Training Loss After Achieving Zero Training Error?”, ICML 2020 アヌバワ アヌバワ [slide], [repo]
Week 6, May 18 Seminar    
  Sabour+, “Dynamic Routing Between Capsules”, NIPS 2017 尤 書恒 (You Shuheng) [slide], [repo]
  Choudhary+, “BerConvoNet: A deep learning framework for fake news classification”, Applied Soft Computing 陳 星星 (Chen Xingxing) [slide], [repo]
Week 7, May 25 Seminar    
  V. G. Goecks+, “Combining Learning from Human Feedback and Knowledge Engineering to Solve Hierarchical Tasks in Minecraft”, AAAI-MAKE 2022 ユンセィ セィリン [slide], [repo]
  M. Guo+, “PCT: Point cloud transformer”, Computational Visual Media 2021 Fu Lian [slide], [repo]
  X. Wang+, “KGAT: Knowledge Graph Attention Network”, KDD 2019 楊 博銘 (Yang Boming) [slide], [repo]
Week 8, June 8 Seminar    
  S. Lin+, “Real-Time High-Resolution Background Matting”, CVPR 2021 舘野 将寿 (Masatoshi Tateno) [slide], [repo]
Week 9, June 15 Coding day No class    
Week 10, June 22 Coding day   [slide]
Week 11, June 29 Seminar for PR    
    吴 宇涵 (Yuhan Wu) [slide]
    YANG Chengkai [slide]
    naba89 [slide]
    郑 书晗 (Zheng Shuhan) [slide]
    アヌバワ アヌバワ [slide]
Week 12, July 6 Seminar for PR    
    尤 書恒 (You Shuheng) [slide]
    ユンセィ セィリン [slide]
    Fu Lian [slide]
    楊 博銘 (Yang Boming) [slide]
    舘野 将寿 (Masatoshi Tateno) [slide]
    朱 国豪 (Guohao (Trevor) Zhu) [slide]
Week 13, July 13 Invited talk by Dr. Han Xiao, CEO at Jina AI

When Neural Search Meets Generative AI

Abstract: Neural search is using deep learning to search unstructured data, which has been developed rapidly over the last 2 years. With opensource framework like Jina (, searching cross-modal/multi-modal data via deep neural networks becomes extremely straightforward. DALL·E, a powerful image-to-text generator released by OpenAI in 2021 further boosts the popularity of multimodal applications. We now see thousands of astonishing artwork made by DALL·E every day. How all these new technologies will change our way of comprehending data? Most importantly, how can developers easily build solutions & applications leveraging those technologies? This tutorial will answer these two questions. This workshop is bi-partite. In part 1, Dr. Han Xiao will introduce the recent advances of neural search and multi-modal intelligence. He will break down the design principle of Jina ecosystem. In part 2 (bring your laptop), Han will guide step by step to use DiscoArt ( to create compelling Disco Diffusion artworks. He will demonstrate how Jina unlocks multi-modal/cross-modal capability in your solution & applications. This is a great chance to get hands dirty with Jina and DocArray’s Pythonic API and to embrace the charm of generative arts.

Short biography: Dr. Han Xiao is the Founder & CEO of Jina AI, a commercial opensource company based in Berlin. Since its founding in 2020, Jina AI has raised $38M from top investors, including GGV, Cannan, YUNQI, Jina AI is one of the most promising AI startups globally according to CBInsights 2022, 2021 and Forbes DACH 2020. Before Jina AI, Han led a team on neural information retrieval at Tencent AI, laying down the next-gen search infrastructure. Han served as a board member at Linux Foundation AI in 2019, driving the opensource innovation and international collaboration. In 2014-18 Han worked at Zalando Research in Berlin as a Senior Research Scientist. Han received a Ph.D. (2014) and MSc. (2011) in computer science from the Technical University of Munich in Germany.