We De-mystify Machine Learning
& Artificial Intelligence
Machine Learning and Artificial Intelligence can seem daunting, but they don’t have to be.
We can show you how to gain business insights and learn more about your customers using the data you already have today.
Skip the Hype and Avoid the Snakeoil
The recent boom in ML/AI interest is one of the best and worst things to come along in a while for people like us: seasoned practitioners who were doing this ‘before it was cool’, when ML/AI was largely relegated to the realms of megacorp R&D and science labs.
We love this change because people are finally excited about stuff we’ve been working on for quite a while. But, this also means now there are are a lot of misconceptions out there about these technologies, too.
Case Study #1: The Internal Documentation Makeover
The Challenge
A large company wanted to minimize the amount of manual review needed to reorganize their stale internal documentation, as teams were having a difficult time finding the information they needed and much of it was out of date.
Our Approach
User Behavior Analysis: We looked at access patterns of the existing documentation to compare actual human behavior with self-reported preferences. We identified patterns we could use to tailor the documentation to match the user behavior better.
Clustering: Clustering groups data points (including text documents) into distinct clusters based on similarity, typically assigning each document to a single cluster without considering multiple topics. This showed us some emergent possibilities that we had not considered for top-level topics in our information hierarchy.
Topic Modeling: This technique identifies and assigns multiple topics to documents based on their content, allowing for nuanced categorization of text. This helped us get on top of many of the larger, poorly formatted and unstructured documents.
Automated Updates for 3rd Party Packages: Because portions of the documentation referenced 3rd party tools and software used by the company, we wrote a script that would compare the version number of the software referenced internally with the latest version and flag articles for review so that information would not remain too out of date.
The Results
Not only did we save this company a ton of time and money using a combination of unsupervised learning and classic automation techniques, we also coached them on how to institute a guild model for shared governance across their teams to ensure everyone is invested in the long term maintenence of their internal documentation.
Case Study #2: Better Fraud Detection & Consumer Insights
The Challenge
A rapidly growing start-up brand specializing in streetwear drops reached out to us because they wanted to better detect bot fraud and optimize their drop timings. They had accumulated a vast amount of unstructured data and weren’t sure what to do with it.
Our Approach
Data Lake Assessment & Reachitecture: We assessed all sources and the quality of data in their data lake to refine their data pipeline going forward. This included renaming, recategorizing, and normalizing the datatypes in the data lake, as well as assessing their (lack of) indexing.
Collaborative Filtering for Better Consumer Behavior Insights: Collaborative filtering is a recommendations technique that predicts user interest by assuming that people who agreed in the past will agree in the future. We used this method to analyze behavior patterns, preferences, and interactions so we could provide personalized upsells and marketing strategies on a per-user basis.
Time-Series LSTM Neural Net for Forecasting Trends: An LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) designed to recognize patterns in sequences of data, with the ability to remember long-term dependencies. LSTMs are great for forecasting consumer trends and identifying potential high-demand items for a retail company
Advanced Fraud Detection Using Isolation Forests: An Isolation Forest is an anomaly detection technique that identifies outliers by isolating them through random partitioning, using fewer splits for anomalies than for normal points. This helped us identify bot behavior more easily.
The Results
By re-architecting this company’s data lake and implementing machine learning solutions that enabled this rapidly growing brand to gain valuable consumer insights and improve fraud detection, we helped them boost their sales enough to reach several internal milestones in the quarter following our engagement. We think that’s pretty sweet.
Case Study #3: Getting Big Results From Small Datasets
The Challenge
A music start-up that had recently raised it’s Series A came to us and asked what their options were for creating a better recommendation system for their digital products. They didn’t believe they had enough data to do this, but because they were dealing in music assets, we thought differently.
Our Approach
Repurpose Open Source Datasets: We created a composite dataset of some of the songs from the GTZAN and FMA music datasets made up of music that we knew matched the genres of their dataset, and partitioned all items in the new dataset into shorter samples for processing.
Transfer Learning: Because we only used a fraction of the relevant songs from the open source datasets, we then trained several neural network models using these sets, and then re-trained the last few layers using our revised, repuposed dataset that included their original music to train the models.
Keras Optimizer: We used the Keras Optimizer package to tweak and refine the models we built, including optimizing the hyperparameters. We didn’t leave anything to guesswork, which we’ve unfortunately come across a lot in the projects we have inherited from other developers.
Euclidean Distance to Find Similar Sounds: We used the neural network to disciminate between genres, and then for the genre the music was most likely to be in, we found similar songs in that genre using a classic machine learning technique using euclidean distance.
The Results
This company told us that we had gotten them further than any of the folks they had hired before for this task, which was music to our ears. The company had plans to use the tool manually for curating new songs, and also to put it in their product in a mobile app that lands musicians sync placements and industry connections.
What Makes Us Different
Keras, Ternsorflow, & more
Whether Tensorflow and Keras, PyTorch, Scikit-Learn, or somethign else, we can stitch together the right stack of tools to get you great results.
We want to spend our time solving your problem, not sitting in meetings.
We Don’t Like Guessing
We use tools like the Keras Optimizer, stratified sampling, and K-fold cross validation to make sure we are getting you the best results possible.
Machine Learning doesn’t have to feel like a mysterious black box.
ML/AI Explained Simply
In our milestone presentations, we break down the machine learning lifecycle so you understand what we are doing and what to expect.
We train your team so they feel confident in how to maintain your new AI investment.
Intro Call &
Requirements
We run through our systematic checklist so that we get you exactly what you need for your specific business case.
We Build Your Solution
We use industry best practices to deliver the solution as specified, tailored to your needs, following the machine learning lifecycle.
We Train & Transfer
We teach you and your team how to take care of your new machine learning model and how to put it into production effectively.
Engram Software
© 2024 ENGRAM SOFTWARE - MADE WITH 💙 IN BROOKLYN, NY