This Week in Data #2019-3

This week we will be focusing on data science tools and articles. Here is an overview:

  • Data Drifter - searching for open Google Cloud buckets
  • Using Twitter for Threat Intelligence
  • Scraping YouTube for various activity
  • Checking information leakage with Machine Learning
  • I’m a data scientist who is skeptical about data
  • Forty percent of ‘AI startups’ in Europe don’t actually use AI, claims report

Tool: Data Drifter

Our friends Jackie and Jason at @SpyglassSec have created this cool tool for you to search for open Google Cloud instances. It reminds me of the first iteration of Amazon bucket search engines. Considering GCP is in third place  behind Amazon and Microsoft in the cloud, you will surely not 'see' as much content as Amazon, however considering the smaller amount people aware of bad GCP configurations, you might be able to find juicier data in open cloud instances. I look forward to playing with this further during the week. Perhaps it will merit it's own dedicated blog post.

Research: Using Twitter for Actionable Threat Intelligence

Trend Micro's research department did a deep dive into using social media data from Twitter for actionable threat intelligence. They went over link analysis, sentiment analysis, and big data techniques for sorting through Tweets and looking for conversations around specific topics. Honestly, this reminded me of my own research at Cyxtera. You can find all sorts of gems hidden in Twitter, from hacker chatter and fake news mills to actual left-wing, right-wing, and Islamist extremists talking between each other. Read the work I did at Cyxtera linked below.

Link: https://www.cyxtera.com/blog/brainspace/how-bots-shape-our-politics-brexit-analysis

Tool: Scraping YouTube for various activity

YouTube is one of my favorite sites to learn new topics and see the discourse around trending news (and drama). Last week I linked a tool that allows you to view YouTube reccomendations as linked arrows. This week we have a tool that allows you to scrape various activity on the site such as: keyword/topic frequency, comment scraping, and a user's comment history across the site. This has various research applications such as sentiment analysis, looking at what the top comments are on videos when you go down the "YouTube extremist rabbit-hole", etc. I look forward to using this tool, in the future this may be the basis for an entire project focused on sentiment analysis.

Paper: Using ML to measure the security/information leakage of systems efficiently

Congrats to Giovanni at University of London for completing his PhD! His final diseration is on the topic of "Measuring Black-box Information Leakage via Machine Learning." With the rise of encrypted communications among 'normal' people this is a relevant topic. Giovanni also released a tool on Github complementing the research.

Link: https://github.com/gchers/fbleau

Article: I’m a data scientist who is skeptical about data

Having done several internships in data science this article hit the nail on the head. The tl;dr for those of you that skimmed through is data can be biased and inaccurate, so can data scientists be biased as well but at the same time data helps us find the context behind complicated events and we should invest on being better data scientists as opposed to losing faith.

Article: Forty percent of ‘AI startups’ in Europe don’t actually use AI, claims report

This one is self explanitory. Too many startups are claiming to use AI when they don't use anything close to it. It's all a marketing ploy for money and investment. Note: if any VC is reading this, I can vet your AI company prospects for a reasonable consultancy fee :)