MindSpore for Vision Transformer
Transformer-like neural nets are so hot right. I could not resist the urge to get some hands-on experience with it. I think the best way is to read the original paper
and start the implemtentation from scratch. That is how the project was born.
The core of ViT archtecture is MultiheadAttention, which is available in MindSpore (and Pytorch too).
It was that not hard to implement it yourself. You are guaranteed to get a much better understanding.
[Code available here]
CANN Hardware Video Decoding for LibValkka
LibValkka is a python media-streaming framework that supports CPU/GPU video decoding backend.
This work implements Huawei NPU acceleration to libValkka as a separate module.
Now you get to decode 16-channel of 1080p video streams @30FPS without burdening your CPU.
[Code available here]
Python ACL Samples
ACL (Ascend Computing Language) provides a collection of C++/Python APIs for developing neural network inference applications on the Huawei NPU platform. These APIs are designed for image preprocessing,
media data processing, neural network and customized operators execution.
This repository provides Python programs that includes YOLOv3, YOLOv5, OpenPose etc.
They are meant not only for demonstration purpose but also help developers to quickly
adapt to Huawei AI stack. [Code available here]
Route Planning with Neural Networks
Without a navigation software, how does a human find an efficient path from A to B with little knowledge
of the road network?
A quick answer is that, a human will gather information from road signs (direction and speed limit). Can we teach a neural network agent to navigate like a human?
In my thesis work, I took a staggered approach by first training a simple NN to be greedy. Then I generated thousands of graphs with NetworkX with random edge attributes (simulating road traffic situation).
The paths generated from A* and Dijkstra's algorithm are fed to a neural network agent so it can generate similar paths. The trained agent showed some very interesting characteristics.
[Thesis available here]
Multimodal Fusion for Human-Robot Collaboration
I had the honor to work with Dr. Hongyi Liu as a research assistant during my pursuit of a master degree at KTH Royal Institute of Technology. In this work, an neural net architecture consists of three modalities: speech, hand motion (LeapMotion), and body motion (video) is proposed.
Three unimodal models were first trained to extract features, which were further fused for representation sharing. Experiments showed that the proposed multimodal-fusion model outperforms the three unimodal models. [Paper available here]