Robert Luh, Sebastian Schrittwieser, Stefan Marschalek. “LLR-based sentiment analysis for kernel event sequences”. Proceedings of the 31st IEEE International Conference on Advanced Information Networking and Applications (AINA), 2017
Behavior-based analysis of dynamically executed binaries has become a widely used technique for the identification of suspected malware. Most solutions rely on function call patterns to determine whether a sample is exhibiting malicious behavior. These system and API calls are usually regarded individually and do not consider contextual information or process inter-dependencies. In addition, the patterns are often fixed in nature and do not adapt to changing circumstances on the system environment level.
To address these shortcomings, this paper proposes a sentiment extraction and scoring system capable of learning the maliciousness inherent to n-grams of kernel events captured by a real-time monitoring agent. The approach is based on calculating the log likelihood ratio (LLR) of all identified n-grams, effectively determining neighboring sequences as well as assessing whether certain event combinations incline towards the benign or malicious. The extraction component automatically compiles a WordNet-like sentiment dictionary of events, which is subsequently used to score unknown traces of either individual processes, or a session in its entirety.
The system was evaluated using a large set of real-world event traces collected on live corporate workstations as well as raw API call traces created in a dedicated malware analysis environment. While applicable to both scenarios, the introduced solution performed best for our abstracted kernel events, generating both new insight into malware-system interaction and assisting with the scoring of hitherto unknown application behavior.