I'm glad you posted the questions, lumpyme. I read more than the average person on this AI processor stuff and I'll try to answer, for the sake of discussion and building up my own knowledge, please correct me if I'm wrong.
Important: I'm just
speculating and discussing. I don't know much more advance information than what's published at this time.
Optimizations determine what kind of processor it is
What you say about CPU being optimized for integer and GPUs being optimized at floating point is correct but not complete. GPUs are made for graphics rendering and hence require a whole lot more optimizations and differences from CPU, including very high number of cores, and much higher efficiency for what they do, to suit the purpose. The difference between CPU and GPU, hence, is their optimizations. NPUs are also different because of their optimizations.
One of the key optimizations of the TPU/NPU is reduced precision of calculations, to maybe 8-bits in the case of Google's TPU, to be even more power efficient. Because it doesn't need the precision for its purpose.
Training vs Inference
In the Machine Learning AI world, CPU and GPU working closely together, are used for TRAINING. You got ten thousand images of cats and dogs, and you want to train your AI to recognize what's a cat and what's a dog, you use CPU and GPU in combination for that.
When you ALREADY got a trained model from your thousands of CPU and GPU, this model is the set of machine-made algorithms which you can use to decide whether a NEW image of an animal is a dog or cat. This process is called INFERENCE and it is best performed using a AI CPU, or TPU, or Neural Net Processing Unit (NPU). Diagram below.
TPU specific optimizations
TPU also strips all the other functions which are unnecessary for inference. If a TPU is designed for 'INFERENCE' only, are basically integer only devices, but some TPUs are also designed to do some `TRAINING' and they have FP math optimizations built in too. However, the instructions for certain functions which machine learning requires are optimized, given more pipelines, and basically given more silicon. Optimizations are done for thoroughput and power consumption for the specific purpose of the NPU.
Adapting inference to mobile
Google TPU was made to be server-based. They perform inference with large models on server side, continually updated, and optimized for massive speed without so much regard for power consumption.
When you switch to mobile inference, the models are too big, too power hungry for mobiles.
What Google tensorflow team did, was to reduce the Model size using various techniques, mostly `reduce precision' and `trim unnecessary neurons' to reduce to a very manageable size (they keep on saying 20MB).
Certain apps like Google Translate, use Mobile CPU and GPU to power the inference engine and draw data from the compressed neural network model to perform realtime translation based on camera image (try it yourself). The performance is OK but still laggy, and power consumption is quite high.
Android MM and Tensorflow Lite API/instruction set/hardware abstraction layer
Google also foresaw that future mobile phones might have neural net accelerators and NPUs and therefore they made APIs like Tensorflow Lite and Android NN on top of a hardware abstraction layer, so that developers can just write for TF Lite and Android NN works as a standard interface to TF Lite. The presence of a hardware abstraction layer means that as long as `drivers' or `mapping tables' are properly written for the NPU/AI CPU, the developer can just write for TF Lite and does not need to give any consideration to `what NPU is present on the mobile device'. This means that, AI-accelerated apps would not need to be written for specific hardware for acceleration. Google did the job.
You might want to ask, when phone don't have NPU, how? No problem. Android MM hardware abstraction layer will still address CPU and GPU as fallback, and you will still be able to run the AI-accelerated app, but not as fast and not as power efficient as a phone with NPU.
Benefits of NPU on a mobile phone
With more inference based AI apps on the way, it becomes quite obvious that some sort of hardware accelerator is very deisrable for mobile phones. Since 2015 (read somewhere) Huawei has been looking for a partner and has worked with Cambricon Technologies to integrate the AI chip. Cambricon is one of the pioneers in mobile AI chip design and according to some commentators, it's probably going to be the `intel' of the moble AI chip world.
Delving into the Cambricon Technologies' chip, it has certain design features which makes it different from upcoming competitors but it's beyond my understanding on the true significance of their design. Sorta like it stores results in buffer instead of needing to access DRAM memory all the time.
There are many many functions which are possible once you have an NPU on board. Google Translate's features is one. A future app when you focus your camera on a cucumber in the market it will tell you the cucumber type, taste, and freshness. Realtime face mapping (which Apple demonstrated today with Animoji face mapping on texture). Realtime photo retouch based on learned models of tens of thousands of raw and retouched pictures. Realtime voice enhancement, noise cancellation, based on ML models instead of DSP used now. The possibilities are endless.
The more apps rely on AI, which is `for sure', the better NPU equipped phones will perform, and the longer their batteries will last relative to phones without NPU.
Sorry for the wall of text but at least we can refer to these speculations here to see how good our hit rate is after Oct 16 release of Mate 10!
references:
differences between GPU CPU and TPU
https://www.extremetech.com/computi...u-makes-hash-intel-nvidia-inference-workloads
Making tensorflow `mobile'
https://www.youtube.com/watch?v=EnFyneRScQ8
Huawei's (rumoured) Cambrian NPU further optimizations and concept
http://ieeexplore.ieee.org/document/7783723/
Basic architecture of Android NN, TF Lite
https://www.youtube.com/watch?v=25ISTLhz0ys
https://www.youtube.com/watch?v=0r9w3V923rk
Huawei to use Cambricon Technologies' IP in their Kirin 970 NPU
http://www.anandtech.com/show/11804...-at-ifa-2017-dedicated-neural-processing-unit
Huawei Kirin 970 AI NPU to support Tensorflow
https://syncedreview.com/2017/09/02/huawei-announces-kirin-970-ai-in-your-phone/
Brief overview by non-english speaker on Huawei AI chip
http://www.startlr.com/artificial-intelligence-on-smartphones-is-the-first/