Large Language Models (LLMs) are exceptionally resource-intensive on the CPU and memory, but Apple is said to be experimenting with storing this technology on flash storage, likely to make it easily accessible on multiple devices. However, the technology giant also wishes to make LLMs ubiquitous on its iPhone and Mac lineup and is exploring ways how to make this possible.
Under typical conditions, Large Language Models require AI accelerators and a high DRAM capacity to be stored. As reported by TechPowerUp, Apple is working to bring the same technology, but to devices that sport limited memory capacity. In a newly published paper, Apple has published a paper that aims to bring LLMs to devices with limited memory capacity. iPhones have limited memory too, so Apple researchers have developed a technique that uses flash chips to store the AI model’s data.
Since flash memory is available in abundance on Apple’s iPhones and Mac computers, there is a way to bypass this limitation with a technique called Windowing. In this method, the AI model reuses some of the data that it has already processed, reducing the requirement for continuous memory fetching and making the entire process faster. The second technique is Row-Column Bundling; data can be grouped more efficiently, allowing the AI model to read data from the flash memory faster and speeding up its ability to understand.
Both techniques and other methods will allow the AI models to run up to twice the size of the iPhone’s available RAM, resulting in up to 5 times increase in speed on standard processors and up to 25 times faster on graphics processors. There is plenty of evidence suggesting that Apple is serious about AI, starting with its own chatbot, which is internally
Read more on wccftech.com