Machine Learning and Ruby: Enhancing the Cybersecurity Industry Part III

This is the last part of this three-part series. So far, we covered the basics of Ruby and Machine Learning. In this last part, I will be demonstrating how machine learning can enhance the cybersecurity industry.

Demo Presentation

To start, I want to share a link to the presentation I did on this topic as well as a link to the code:

Although I covered the demonstration in detail, I will elaborate by sharing two real-world use cases.

Use Case 1: Microsoft Azure

Microsoft Azure is a “cloud computing service created … for building, testing, deploying, and managing applications and services through Microsoft-managed data-centers” [1]. It is a one-stop-shop for Microsoft services.

One of the components within Azure is its ability to use machine learning. It can use machine learning to analyze network activity and propagate any suspicious activity for further investigation [2]. As Microsoft products are widely adopted, massive amounts of data are being generated by the use of said products. This data can then be stored, parsed, and used for analysis.

Access to massive amounts of data is one prerequisite for machine learning. For a machine to learn, it requires data to learn from. Bayes’ Law does a good job at quantifying this for us [3]:

Bayes’ Law:

Initial Beliefs + Recent Objective Data = A New and Improved Belief.

According to the equation above, if we were to add more data to our initial beliefs, it will result in a new and improved belief. This is how the machine learns.

Azure uses the data it collects and runs it through machine learning algorithms. The more data analyzed, the more accurate the results. For example, Azure learns to determine whether logins are suspicious and when emails are malicious. This is because the dataset contains many examples of these cases and will learn to recognize them over time.

Use Case 2: Palo Alto Networks Wildfire

Palo Alto Networks is an “American multinational cybersecurity company [who provide] … firewalls and cloud-based offerings that extend those firewalls” [4]. Their next-generation firewalls and end-point device protection use machine learning to determine threats.

Palo Alto Networks Traps and Next-Generation Firewall use one of their products called Wildfire to determine threats. Palo Alto Networks Wildfire is “a cloud-based threat-analysis service which uses … [machine learning] and bare-metal analysis to discover and prevent unknown threats” [5]. This cloud-based service allows for realtime data analysis to determine if it contains malicious payloads.

Network traffic can be run through Wildfire via Traps or Next-Generation Firewall to determine if a payload is malicious. If it is, it can take immediate action to quarantine it. Since Wildfire uses machine learning, it can then take that malicious payload and add it to its dataset so that its behaviour can be recognized in other situations.

My Demonstration

The demonstration I created for this series focused more on the selection of suspicious activity (i.e. Azure) rather than malicious payloads (i.e. Wildfire). Albeit simple, the demonstration I shared can be scaled to include millions of detailed log entries. The demo is a starting point and can include many more features such as sending emails and gathering logs from multiple sources. The basis of my demonstration came from the following GitHub repository:

https://github.com/igrigorik/decisiontree

There are many machine learning libraries in Ruby, each with its differing algorithms. The one I chose is a decision tree example. I know that there are more sophisticated algorithms than this one. They can be found here:

https://rubygems.org/search?utf8=%E2%9C%93&query=machine+learning

Conclusion

Machine learning is a very powerful technology and can be implemented relatively easily with a language like Ruby. As software developers, we must take full advantage of the technology we have today. Yes, there are steep learning curves to overcome. However, cybersecurity is a growing need in our society. We can use technology like machine learning to overcome this need.

In conclusion, I hope that this series demonstrated how machine learning can benefit the cybersecurity industry.

References

[1] “Microsoft Azure,” Wikipedia, 29-Mar-2020. [Online]. Available: https://en.wikipedia.org/wiki/Microsoft_Azure. [Accessed: 29-Mar-2020].

[2] R. Ronen, “Machine Learning in Azure Security Center,” Azure Blog and Updates | Microsoft Azure. [Online]. Available: https://azure.microsoft.com/en-us/blog/machine-learning-in-azure-security-center/. [Accessed: 29-Mar-2020].

[3] S. B. McGrayne, “Why Bayes Rules: The History of a Formula That Drives Modern Life,” Scientific American, 01-May-2011. [Online]. Available: https://www.scientificamerican.com/article/why-bayes-rules/. [Accessed: 29-Mar-2020].

[4] “Palo Alto Networks,” Wikipedia, 01-Mar-2020. [Online]. Available: https://en.wikipedia.org/wiki/Palo_Alto_Networks. [Accessed: 29-Mar-2020].

[5] 2018 at 12:00 A. M. Dec 19, “WildFire Datasheet,” WildFire Datasheet – Palo Alto Networks. [Online]. Available: https://www.paloaltonetworks.com/resources/datasheets/wildfire. [Accessed: 29-Mar-2020].