Did you know that in the first quarter of 2019, hackers exposed 4.1 billion private data records to the public [1]? The future does not seem bright either. According to Gartner, worldwide security spending is expected to exceed $133 billion in 2022 [2]. With a severe lack of talent in the field of cybersecurity, how are we going to meet the predicted demand [3]? Recent technological advances may solve this issue.
Computers over time have been outperforming themselves time and time again as Moore’s Law has predicted [4]. This provides us many opportunities for technological exploration [5]. One avenue that is revolutionizing industries worldwide is Machine Learning.
Machine Learning can be defined as follows:
Using data to solve problems [6].
Without data, the machine can not learn. Just like humans, we need to experience the world to learn how to live in it. For example, we learn how to walk by trying to walk. We try one way, if it does not work, we try another way. If this way works, we remember it and try to repeat it. Machines, albeit not conscious, learn in a similar way.
Machine Learning encompasses three main steps:
1. Input 2. Analysis & Computation 3. Output
Machines need data in order to learn. They need a lot of data! In fact, millions of data points, depending on what is trying to be accomplished, may not be enough. All this data must be input into the system for analysis and computation.
A machine learning system first needs to have a goal. For example, we want a machine to determine if a patient has a detached retina. The builder of the system inputs retinal scans that show a detached retina. The machine learns how to spot this and “remembers it.” The machine then takes a massive dataset and compares it with this use case. It then concludes based on its inputs.
The results that the machine produces may or may not be correct. If it is not correct, it uses a feedback system to correct its next run. If it is correct, it reinforces its results in the next run.
Machine Learning is a powerful concept but requires a programming language for it to be implemented.
Ruby is a programming language that is “dynamic [and] open source . . . with a focus on simplicity and productivity [7].” It was developed by Yukihiro Matsumoto in the mid-1990s [8]. It has gained popularity over the years and has become the 11th most popular programming language (January 2020) [9]. With its’ focus on simplicity and productivity, Ruby has become a good choice for machine learning projects.
There are a vast number of resources available for using machine learning in Ruby [10]. During this series, however, I will be focusing on the machine learning side of cybersecurity analytics. Enhancing the Cybersecurity Industry
There is an increasing demand for cybersecurity professionals [11]. As with other industries [12], the cybersecurity space is looking for ways to remediate this issue. Companies like Microsoft (Azure machine-learning) [13] and Palo Alto Networks (Next-Generation Firewall) [14] are already using machine learning to improve its detection of phishing emails and malware, respectively. How else can we use machine learning in this sphere?
During this series, I propose that machine learning can be used to sift through security and audit logs to determine suspicious user activity. This activity can then be brought to a security professional to be further analyzed. In this way, security professionals are not wasting their time on irrelevant log checks.
[1] “2019 MidYear QuickView Data Breach Report”. [Online]. Available: https://pages.riskbasedsecurity.com/2019-midyear-data-breach-quickview-report. [Accessed: 25-Jan.- 2020].
[2] “Gartner Forecasts Worldwide Information Security Spending to …”. [Online]. Available: https://www.gartner.com/en/newsroom/press-releases/2018-08-15-gartner-forecasts-worldwide- information-security-spending-to-exceed-124-billion-in-2019. [Accessed: 25-Jan.-2020].
[3] “110 Must-Know Cybersecurity Statistics for 2020 | Varonis”. [Online]. Available: https://www.varonis.com/blog/cybersecurity-statistics/. [Accessed: 25-Jan.-2020].
[4] “Moore’s law – Wikipedia”. [Online]. Available: https://en.wikipedia.org/wiki/Moore%27s_law. [Accessed: 25-Jan.-2020].
[5] “Technology exploration – Clever Franke”. [Online]. Available: https://www.cleverfranke.com/technology-exploration. [Accessed: 25-Jan.-2020].
[6] “What is Machine Learning? A definition – Expert System”. [Online]. Available: https://expertsystem.com/machine-learning-definition/. [Accessed: 25-Jan.-2020].
[7] “Ruby Programming Language”. [Online]. Available: https://www.ruby-lang.org/en/. [Accessed: 25- Jan.-2020].
[8] “An introduction to Ruby Programming: the history of Ruby.”. [Online]. Available: https://launchschool.com/books/ruby/read/introduction. [Accessed: 25-Jan.-2020].
[9] “index | TIOBE – The Software Quality Company”. [Online]. Available: https://www.tiobe.com/tiobe- index/. [Accessed: 25-Jan.-2020].
[10] “Resources for Machine Learning in Ruby – gists · GitHub”. [Online]. Available: https://gist.github.com/gbuesing/865b814d312f46775cda. [Accessed: 25-Jan.-2020].
[11] “Top 8 in-demand cybersecurity jobs in 2020 | EC-Council Official Blog”. [Online]. Available: https://blog.eccouncil.org/top-8-in-demand-cybersecurity-jobs-in-2020/. [Accessed: 25-Jan.-2020].
[12] “Will a robot take my job? | The Age of A.I. – YouTube”. [Online]. Available: https://www.youtube.com/watch?v=f2aocKWrPG8. [Accessed: 25-Jan.-2020].
[13] “Microsoft Azure: Cloud Computing Services”. [Online]. Available: https://azure.microsoft.com/en- us/. [Accessed: 25-Jan.-2020].
[14] “PA-220 – Next-Gen Firewall – Palo Alto Networks”. [Online]. Available: https://www.paloaltonetworks.com/network-security/next-generation-firewall/pa-220. [Accessed: 25- Jan.-2020].