• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Cisco Umbrella

Enterprise network security

  • Contact Sales
  • Login
    • Umbrella Login
    • Cloudlock Login
  • Why Us
    • Why Cisco Umbrella
      • Why Try Umbrella
      • Why DNS Security
      • Why Umbrella SASE
      • Our Customers
      • Customer Stories
      • Why Cisco Secure
    • Fast Reliable Cloud
      • Global Cloud Architecture
      • Cloud Network Status
      • Global Cloud Network Activity
    • Unmatched Intelligence
      • A New Approach to Cybersecurity
      • Interactive Intelligence
      • Cyber Attack Prevention
      • Umbrella and Cisco Talos Threat Intelligence
    • Extensive Integrations
      • IT Security Integrations
      • Hardware Integrations
      • Meraki Integration
      • Cisco Umbrella and SecureX
  • Products
    • Cisco Umbrella Products
      • Cisco Umbrella Cloud Security Service
      • Recursive DNS Services
      • Cisco Umbrella SIG
      • Umbrella Investigate
      • What’s New
    • Product Packages
      • Cisco Umbrella Package Comparison
      • – DNS Security Essentials Package
      • – DNS Security Advantage Package
      • – SIG Essentials Package
      • – SIG Advantage Package
      • Umbrella Support Packages
    • Functionality
      • DNS-Layer Security
      • Secure Web Gateway
      • Cloud Access Security Broker (CASB)
      • Cloud Data Loss Prevention (DLP)
      • Cloud-Delivered Firewall
      • Cloud Malware Protection
      • Remote Browser Isolation (RBI)
    • Man on a laptop with headphones on. He is attending a Cisco Umbrella Live Demo
  • Solutions
    • SASE & SSE Solutions
      • Cisco Umbrella SASE
      • Secure Access Service Edge (SASE)
      • What is SASE
      • What is Security Service Edge (SSE)
    • Functionality Solutions
      • Web Content Filtering
      • Secure Direct Internet Access
      • Shadow IT Discovery & App Blocking
      • Fast Incident Response
      • Unified Threat Management
      • Protect Mobile Users
      • Securing Remote and Roaming Users
    • Network Solutions
      • Guest Wi-Fi Security
      • SD-WAN Security
      • Off-Network Endpoint Security
    • Industry Solutions
      • Government and Public Sector Cybersecurity
      • Financial Services Security
      • Cybersecurity for Manufacturing
      • Higher Education Security
      • K-12 Schools Security
      • Healthcare, Retail and Hospitality Security
      • Enterprise Cloud Security
      • Small Business Cybersecurity
  • Resources
    • Content Library
      • Top Resources
      • Cybersecurity Webinars
      • Events
      • Research Reports
      • Case Studies
      • Videos
      • Datasheets
      • eBooks
      • Solution Briefs
    • International Documents
      • Deutsch/German
      • Español/Spanish
      • Français/French
      • Italiano/Italian
      • 日本語/Japanese
    • For Customers
      • Support
      • Customer Success Webinars
      • Cisco Umbrella Studio
    • Get the 2022 Cloud Scurity Comparison Guide
  • Trends & Threats
    • Market Trends
      • Hybrid Workforce
      • Rise of Remote Workers
      • Secure Internet Gateway (SIG)
    • Security Threats
      • How to Stop Phishing Attacks
      • Malware Detection and Protection
      • Ransomware is on the Rise
      • Cryptomining Malware Protection
      • Cybersecurity Threat Landscape
      • Global Cyber Threat Intelligence
      • Cyber Threat Categories and Definitions
    •  
    • Woman connecting confidently to any device anywhere
  • Partners
    • Channel Partners
      • Partner Program
      • Become a Partner
    • Service Providers
      • Secure Connectivity
      • Managed Security for MSSPs
      • Managed IT for MSPs
    •  
    • Person looking down at laptop. They are connecting and working securely
  • Blog
    • News & Product Posts
      • Latest Posts
      • Products & Services
      • Customer Focus
      • Feature Spotlight
    • Cybersecurity Posts
      • Security
      • Threats
      • Cybersecurity Threat Spotlight
      • Research
    •  
    • Register for a webinar - with illustration of connecting securely to the cloud
  • Contact Us
  • Umbrella Login
  • Cloudlock Login
  • Free Trial
Research

The Hacker's Manifesto Revisited

By Thibault Reuille
Posted on August 28, 2014
Updated on October 15, 2020

Share

FacebookTweetLinkedIn

Another one got caught today, it’s all over the papers. “Teenager Arrested in Computer Crime Scandal”, “Hacker Arrested after Bank Tampering”… Damn kids. They’re all alike.

You may have recognized the opening lines of this now legendary text. The Hacker’s Manifesto, first published in Phrack #7 in 1986, was written by “The Mentor” shortly after his arrest. It is now part of the common hacker knowledge and stays a monument of the cyber culture. Today, we would like to give it a new lease on life using OpenGraphiti, our data visualization engine.

In this article, we will present you a way to do some text analysis with OpenGraphiti combined with NLTK, the Natural Language Toolkit.

First let’s have a quick look at what NLTK is and does :

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, and an active discussion forum.
(Source : http://www.nltk.org)

In other words, NTLK is a text processor for human languages such as English, Spanish, French, Chinese … We will use it to parse our Hacker’s Manifesto and analyze the output with OpenGraphiti. This will bring a new light on the structure of that text and the way NLTK parses words and sentences in a unique and visual way. Obviously, this technique can be very well applied to any other text, our use case here only serves as an example.

Parsing the text with NLTK

Now we will assume that you have NLTK installed on your machine and that you have a file called “manifesto.txt” containing the text to process. The relevant Python code to parse your text data looks like this :

    data = list()
    with open("manifesto.txt", "rU") as infile:
        text = infile.read()
        print("Parsing sentences and tagging words with NLTK ...")
        sentences = nltk.sent_tokenize(text)
        for sentence in sentences:
            tokens = nltk.word_tokenize(sentence)
            tagged = nltk.pos_tag(tokens)
            data.append(tagged)
        print(tagged)

Fairly straightforward indeed :

  • Open the file, read it.
  • Cut the text into sentences
  • Foreach sentence, cut it into words
  • Tag the words with an NLTK type
  • Print the result

In order to illustrate NLTK’s mechanism, let’s just focus on this sentence :
Have you ever looked behind the eyes of the hacker ? Did you ever wonder what made him tick, what forces shaped him, what may have molded him ?
For that specific sentence, here is what NLTK would give us :

    [('Have', 'NNP'), ('you', 'PRP'), ('ever', 'RB'), ('looked', 'VBN'), ('behind', 'IN'), ('the', 'DT'), ('eyes', 'NNS'), ('of', 'IN'), ('the', 'DT'), ('hacker', 'NN'), ('?', '.')]
    [('Did', 'NNP'), ('you', 'PRP'), ('ever', 'RB'), ('wonder', 'JJR'), ('what', 'WP'), ('made', 'VBN'), ('him', 'PRP'), ('tick', 'VBP'), (',', ','), ('what', 'WP'), ('forces', 'NNS'), ('shaped', 'VBD'), ('him', 'PRP'), (',', ','), ('what', 'WP'), ('may', 'MD'), ('have', 'VB'), ('molded', 'VBN'), ('him', 'PRP'), ('?', '.')]

As you can see, the whole text has been cut into sentences. Those sentences are represented by arrays of words. Each word is represented by a pair of elements. The first one is the word, the second is the NLTK type. Now how do we know what they mean ? Well, NLTK provides a very simple way to read the documentation about those types. For example, let’s focus on the tag associated with the word “looked” in the first sentence. (VBN)

    $ python
    >>> import nltk
    >>> nltk.help.upenn_tagset('VBN')
    VBN: verb, past participle
        multihulled dilapidated aerosolized chaired languished panelized used
        experimented flourished imitated reunifed factored condensed sheared
        unsettled primed dubbed desired ...

Fair enough! NLTK gives us a code to communicate the type/function of a word in a sentence. Now all we have to do is use SemanticNet to transform our tagged tokens into a nice graph.

Build the graph

The process is fairly simple, we will just parse all the sentences and create a connected path between the succession of words in the sentence. If a word or edge has already been created, we don’t recreate it. If we apply this on the whole text, this will give us a graph of words connected when they appear next to eacher in the text. And finally we can type them with the NLTK tag.

Example :
“NLTK is amazing. OpenGraphiti is great too.”
This will be parsed and tagged like this :

    [(u'NLTK', 'NN'), (u'is', 'VBZ'), (u'amazing', 'VBG'), (u'.', '.')]
    [(u'OpenGraphiti', 'NNP'), (u'is', 'VBZ'), (u'great', 'JJ'), (u'too', 'RB'), (u'.', '.')]

We can then create a graph defined as follows :

    Nodes : NLTK, is, amazing, ., OpenGraphiti, great, too
    Edges :
        NLTK --> is,
        is --> amazing,
        amazing --> .,
        OpenGraphiti --> is,
        is --> great,
        great --> too,
        too --> .

Let’s take a look at our creation algorithm using SemanticNet to perform that task.

    import semanticnet as sn
    graph = sn.DiGraph()
    for sentence in data:
        previous = None
        for token in sentence:
            word = token[0].lower()
            type = token[1]
            if not graph.has_node(token[0]):
                current = graph.add_node({ "label" : word, "type" : type}, word)
            else:
                current = word
            if previous is not None:
                edges = graph.get_edges_between(previous, current)
                if not edges:
                    graph.add_edge(previous, current)
            previous = current
    graph.save_json("manifesto.json")

Notes

  • We use the lower() method to transform everything in lowercase. (OpenGraphiti and opengraphiti would be treated equally)
  • We don’t make the distinction between different types on the same word. We keep whichever appears first.

Going further

We could spice things up a little bit. For instance, we could count the number of occurrences of each edge. That would give us a Markov graph of all the word transitions. That can also be visualized.

Another idea is to create a timeline to play the text over time and simulate it as if it was spoken in realtime!
 

Visualization

Once our SemanticNet graph has been created, we can visualize it with the OpenGraphiti engine :

    $ ./graphiti demo manifesto.json

We can now contemplate the result in three dimensions! We captured a video of the engine in action and we are happy to share it with you. This video first shows you the resulting graph created with the technique described above, and then plays the text over time.