4840-1054: Media Computing in Practice (Summer 2022)

News

July 6: Updated the invited talk.
April 21: Updated the Seminar for PR.
April 21: Updated the schedule and What should you do in each class.
April 14: Updated the schedule. Created the page of What should you do in each class?
April 14: If you’ve submitted a form of SageMaker Studio Lab but still haven’t authorized yet, please email to Takahiro Kubo (AWS) directory.
April 13: Week2 slide has been uploaded.
April 6: Students who wish to take the course must complete the survey on the ITC-LMS ~~by 23:59 on April 6.~~ As I have decided to give a lecture in Week 2, I extended the deadline for the survey until 23:59 on April 12.
April 6: Created a Q&A page.

Overview

This lecture will be given in seminar-style. Students will read, implement, and present a recent paper. The code must be publicly available as a GitHub repository. In addition, students will update other students’ repositories using Pull Requests.

Instructor: Yusuke Matsui.
Dates: Wednesday, 2nd period (10:25 - 12:10)
Location: Zoom (see ITC-LMS for the zoom URL)

Goal

Learn about recent research in the multimedia field (including but not limited to CV, NLP, ML, etc.)
Learn how to publish your research through the release of source code
Learn modern software development through the experience of modifying other students’ source code using Pull Request

Prerequisites

GitHub account (mandatory)

Assessment

Presentation, coding, and Pull Requests.
Depending on the number of participants, attendance may also be considered.

Schedule

Date (2022)	Contents	Presented by	Resources
Week 1, Apr 6	Guidance	Yusuke Matsui	[slide], [repo]
Week 2, Apr 13	Paper selection + PR + GitHub Actions	Yusuke Matsui	[slide]
Week 3, Apr 20	Seminar	Yusuke Matsui	[slide]
	Kim+, “Dynamic Closest Color Warping to Sort and Compare Palettes”, SIGGRAPH 2021	吴宇涵 (Yuhan Wu)	[slide], [repo]
	Huang+, “Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization”, ICCV 2017	YANG Chengkai	[slide], [repo]
Week 4, Apr 27	Seminar
	(1) Takahashi and Mitsufuji, “Densely connected multidilated convolutional networks for dense prediction tasks”, CVPR 2021 and (2) Défossez+, “Music Source Separation in the Waveform Domain”, arXiv 2021	naba89	[slide], [repo]
	Kipf+, “Neural Relational Inference for Interacting Systems”, ICML 2018	郑书晗 (Zheng Shuhan)	[slide], [repo]
Week 5, May 11	Seminar
	Ronneberger+, “U-Net: Convolutional Networks for Biomedical Image Segmentation”, MICCAI 2015	朱国豪 (Guohao (Trevor) Zhu)	[slide], [repo]
	Ishida+, “Do We Need Zero Training Loss After Achieving Zero Training Error?”, ICML 2020	アヌバワアヌバワ	[slide], [repo]
Week 6, May 18	Seminar
	Sabour+, “Dynamic Routing Between Capsules”, NIPS 2017	尤書恒 (You Shuheng)	[slide], [repo]
	Choudhary+, “BerConvoNet: A deep learning framework for fake news classification”, Applied Soft Computing	陳星星 (Chen Xingxing)	[slide], [repo]
Week 7, May 25	Seminar
	V. G. Goecks+, “Combining Learning from Human Feedback and Knowledge Engineering to Solve Hierarchical Tasks in Minecraft”, AAAI-MAKE 2022	ユンセィセィリン	[slide], [repo]
	M. Guo+, “PCT: Point cloud transformer”, Computational Visual Media 2021	Fu Lian	[slide], [repo]
	X. Wang+, “KGAT: Knowledge Graph Attention Network”, KDD 2019	楊博銘 (Yang Boming)	[slide], [repo]
Week 8, June 8	Seminar
	S. Lin+, “Real-Time High-Resolution Background Matting”, CVPR 2021	舘野将寿 (Masatoshi Tateno)	[slide], [repo]
Week 9, June 15	~~Coding day~~ No class
Week 10, June 22	Coding day		[slide]
Week 11, June 29	Seminar for PR
		吴宇涵 (Yuhan Wu)	[slide]
		YANG Chengkai	[slide]
		naba89	[slide]
		郑书晗 (Zheng Shuhan)	[slide]
		アヌバワアヌバワ	[slide]
Week 12, July 6	Seminar for PR
		尤書恒 (You Shuheng)	[slide]
		ユンセィセィリン	[slide]
		Fu Lian	[slide]
		楊博銘 (Yang Boming)	[slide]
		舘野将寿 (Masatoshi Tateno)	[slide]
		朱国豪 (Guohao (Trevor) Zhu)	[slide]
Week 13, July 13	Invited talk by Dr. Han Xiao, CEO at Jina AI When Neural Search Meets Generative AI Abstract: Neural search is using deep learning to search unstructured data, which has been developed rapidly over the last 2 years. With opensource framework like Jina (https://github.com/jina-ai/jina), searching cross-modal/multi-modal data via deep neural networks becomes extremely straightforward. DALL·E, a powerful image-to-text generator released by OpenAI in 2021 further boosts the popularity of multimodal applications. We now see thousands of astonishing artwork made by DALL·E every day. How all these new technologies will change our way of comprehending data? Most importantly, how can developers easily build solutions & applications leveraging those technologies? This tutorial will answer these two questions. This workshop is bi-partite. In part 1, Dr. Han Xiao will introduce the recent advances of neural search and multi-modal intelligence. He will break down the design principle of Jina ecosystem. In part 2 (bring your laptop), Han will guide step by step to use DiscoArt (https://github.com/jina-ai/discoart) to create compelling Disco Diffusion artworks. He will demonstrate how Jina unlocks multi-modal/cross-modal capability in your solution & applications. This is a great chance to get hands dirty with Jina and DocArray’s Pythonic API and to embrace the charm of generative arts. Short biography: Dr. Han Xiao is the Founder & CEO of Jina AI, a commercial opensource company based in Berlin. Since its founding in 2020, Jina AI has raised $38M from top investors, including GGV, Cannan, YUNQI, SAP.io. Jina AI is one of the most promising AI startups globally according to CBInsights 2022, 2021 and Forbes DACH 2020. Before Jina AI, Han led a team on neural information retrieval at Tencent AI, laying down the next-gen search infrastructure. Han served as a board member at Linux Foundation AI in 2019, driving the opensource innovation and international collaboration. In 2014-18 Han worked at Zalando Research in Berlin as a Senior Research Scientist. Han received a Ph.D. (2014) and MSc. (2011) in computer science from the Technical University of Munich in Germany.