This talk showcases a methodology with examples to break down applications to low-level primitives and identify optimizations on existing compute instances or platform or for offloading specific portions of the application to accelerators or GPU’s. With the increasing use of a combination of CPU, GPU and accelerators/ASIC’s, this methodology could prove increasingly useful to evaluate what kind of compute to use and when.