How we Turned an Internal Tool Into a SaaS Platform With 75% Faster Query Response Times For Equinix
Turning an internal tool into a SaaS platform is about more than adding features. It’s about ensuring performance, reliability, security, and availability for all users. Here’s how NaNLABS and Secretly Nice helped Equinix tackle these challenges.
In 2021, we met up with Matt Wolfe to discuss a project for Equinix. We’ll properly introduce Matt later on, but he was an essential partner in this development.
At first, Equinix needed us to debug, refactor, and add new features to an internal sales enablement tool. But this platform became so useful among the team that Equinix decided to make it public, giving us the responsibility to make it SaaS-ready.
This, of course, came with challenges. Turning the tool into a SaaS solution meant migrating to a multi-tenancy architecture with minimal downtime and without affecting data processing times or reliability. Here’s the story of how we enabled Equinix to access a new revenue stream while reducing 40% of infrastructure costs.
Table of contents
Meet the client, a widely used digital infrastructure business
Challenge and solutions: Make it SaaS, fast-loading, and reliable at an affordable price
The results: 75% faster query response times after migration and 40% cost reduction
Looking to reduce costs and optimize operations? At NaNLABS, we specialize in real-time data solutions to improve performance, minimize downtime, and drive significant cost savings.
Meet the client, a widely used digital infrastructure business
Equinix helps enterprises access infrastructure services, build networks, and track their global servers and data centers. With the largest footprint of data centers and thousands of partner networks, it rightly calls itself “the world’s digital infrastructure company”.
But Equinix isn’t the only character in this story. Have you ever heard an anecdote that begins something like “My friend’s cousin’s in-laws…”? Well, this is one of those stories.
We first heard of Equinix after the company hired Matt Wolfe, Partner at Secretly Nice, to work on the redesign of its sales enablement platform. Matt and his team developed a prototype with the help of freelance programmers and software engineers. They came up with a useful sales tool that solved an internal business need.
After a while, it became clear that Equinix wanted to expand its user base and futureproof the tool for scalability. Since at NaNLABS, we specialize in solving complex data engineering challenges and building cloud-native solutions, Matt hired us as their cloud data engineering partners.
Enter NaNLABS, the cloud data engineering experts
As mentioned, Matt’s team had built a prototype of the sales enablement tool. They needed dedicated engineers to stabilize the platform and turn it into an enterprise-level SaaS solution for external customers.
Matt chose to partner with NaNLABS to support his team due to our experience and the positive rapport we built from the beginning. Matt says, “From the first conversation, NaNLABS was really thorough about looking at what the requirements were, who they had available, and how they could effectively support the project. NaNLABS did a great job at communicating their work style and their approach.”
The brief: Stabilize the platform, adopt a multi-tenancy architecture, and improve real-time data processing
At first, the brief was pretty clear. We needed to:
Continue to provide a great user experience (UX) by fixing bugs
Go from a working prototype to a fully-fledged product, and then into an enterprise-ready SaaS platform
Overcome performance issues and improve data processing times
Optimize Kubernetes deployments for enterprise-grade scalability
Reduce infrastructure costs while maintaining high-performance
Let’s explore each of these tasks in more detail.
Debug and refactor the code to offer a great UX
Equinix needed us to stabilize the tool to support internal and external customers. Along with Matt, we encouraged Equinix to migrate the platform to a different database system that used more standard, modern technologies (more on this later!). But before migrating to different tech and architecture, we needed to debug, improve, strengthen, and refactor the codebase.
Migrating to a multi-tenancy architecture to support high-user concurrency
After stabilizing the platform and making it enterprise-level and scalable, the next step was turning it into a SaaS solution. We suggested a multi-tenancy architecture, as it allows for better resource usage and simplifies scalability and maintenance.
Improving data processing times
“The ability to process large amounts of real-time data was critical for Equinix’s operations,” shares Esteban D’Amico, Data Solutions and Co-Founder of NaNLABS. The team needed to find a workaround to the current Neo4j data model to a more efficient data processing workflow that could handle millions of events per second.
Making it more scalable with Kubernetes
Since Equinix is an industry-leading company, we were expecting a big influx of customers as soon as this tool went public. We were tasked with identifying how to better support end users, starting with optimizing Kubernetes for enterprise-grade scalability and high availability across regions.
Turning into a cost-efficient solution
Equinix needed us to stabilize the platform and improve data processing times while keeping it cost-efficient. “We had to improve the system’s ability to handle large-scale customer demand while maintaining resilience and cost efficiency,” says Esteban.
Challenges and solutions: Make it SaaS, fast-loading, and reliable at an affordable price
Improving data processing times and scaling the platform to support concurrent users were some of this project’s main challenges. Here’s a quick overview of how we tackled each one.
Challenge | Solution |
---|---|
Stabilizing the platform to reduce bugs | Debugged the platform after identifying the root cause of issues |
Data processing limitations preventing the tool from processing large data volumes fast | Built a hybrid data streaming and processing model using Neo4j, MongoDB, Hazelcast, Kafka, and Google Cloud BigQuery |
Keeping databases up to date and rightly structured | Migrated to more common, modern technology that’s easier to maintain |
Moving into a multi-tenancy architecture without limiting high availability or data security | Restructured the Kubernetes deployment into a multi-tenancy architecture to scale regionally while maintaining failover mechanisms |
Improving system observability to reduce failures and minimize downtime | Integrated Prometheus, Grafana, and OpenTelemetry to improve observability and system monitoring |
Now, let’s explore these challenges and solutions in more detail.
1. Stabilizing the platform
The tool was initially built as a prototype with added features. This made the platform unreliable and hard to scale. We needed to identify what was causing these bugs, fix them, refactor the code to make the code base cleaner, and set up testing automation to improve code quality.
Solution
The first thing that we did was debug the platform. We ran a root cause analysis to determine what was causing the most critical bugs and fixed them.
Along with input from Secretly Nice and Equinix, our team took on the responsibility of stabilizing the app. To do so, we:
Implemented a flexible code freeze if we found bugs
Increased code coverage by using a combination of unit and integration tests
Used Velero to improve backup and disaster recovery capabilities by having reliable backups and restores for Kubernetes clusters
2. Data processing limitations
Originally, Equinix's software was built on top of uncommon technologies—Neo4j and Java. Neo4j wasn’t the best choice for handling complex queries at scale, especially with low latency.
We needed to determine a new tech stack to migrate the platform that would efficiently process large volumes of real-time events without hurting performance.
Solution
To support millions of real-time events per second, we:
Optimized MongoDB aggregation pipelines for high-throughput analytical queries.
Integrated Hazelcast as a distributed caching layer to reduce redundant database queries and improve response times for frequently accessed data.
Leveraged Kafka consumer groups for asynchronous event processing and dynamic load balancing across multiple services to ensure real-time data processing—and prevent bottlenecks during busy times.
Used Google Cloud BigQuery for fast, cost-effective business intelligence queries.
Optimized Neo4j recursive queries for handling complex facility and connection graphs to process relationship-based queries.
Integrated Cloud Composer (Apache Airflow) for orchestrating complex workflows, automating data pipelines, and improving operational efficiency.
“These changes led the platform to process millions of real-time events per second,” shares Esteban. This way, both internal users and external customers can rely on fast and accurate insights.
3. Maintaining databases
Since this tool is enterprise-level software, it manages large amounts of data. However, the combination of tools made the app complex to access, scale, and modify.
We needed to guarantee the data users saw on the front end was always up to date and making the right correlations. Doing so, on top of the migration and debugging, was challenging due to the chosen technologies.
Solution
To simplify database maintenance for the development team, along with Secretly Nice, we:
Audited the tool’s infrastructure
Integrated multiple Google Cloud services to optimize storage, performance, and scalability
Transitioned from a Neo4j-only database to a hybrid, polyglot database architecture that combined MongoDB and Neo4j:
Neo4j for graph-based relationships (e.g., facility and connection structures)
MongoDB for high-throughput operational data, ensuring better flexibility and performance
MongoDB is much easier to maintain. “We also used feature flags, enabling us to gradually shift queries to the new data model without affecting users,” says Esteban. “We also automated data validation scripts to ensure the migrated data was consistent and accurate before finalizing the transition.”
4. Adopting a multi-tenancy architecture
Although the platform was already deployed on Kubernetes, scaling it for enterprise SaaS customers presented different challenges:
Ensuring high availability across multiple regions while keeping operational costs under control
Implementing secure data isolation between customers
Solution
We implemented multi-tenancy support to guarantee secure data sharing and isolation between customers. We also optimized query performance with tenant-aware configurations.
To address scalability challenges, we:
Restructured the Kubernetes deployment into a multi-tenancy architecture to scale the platform regionally while maintaining failover mechanisms for high availability
Leveraged Helm charts for repeatable and consistent Kubernetes application deployments on Google Kubernetes Engine (GKE)
Used Velero for backup and disaster recovery, enhancing reliability
Integrated Cloud Functions for event-driven processing, optimizing resource usage by automatically scaling compute power based on demand
“We also implemented role-based access control (RBAC) and tenant-aware query optimization, ensuring that each customer’s data remained secure and isolated,” says Esteban.
5. Minimizing downtime by increasing the platform’s observability and recovery response
The platform was packed with legacy code and built on Neo4j, which can be extremely powerful but wasn’t the right fit in this case. “A big part of the tool was built on that platform at the beginning,” says Matt. “As the product matured, it became clear that it was holding the product back, and we needed to move off it.”
We put our heads together and decided that the best option was to move some tasks from Neo4j to MongoDB. The challenge here was making a successful migration with minimal downtime.
Solution
Handling the migration with zero downtime was a key challenge. To do so, we needed to gain visibility into the system and get alerted fast in case anything went sideways. Hence, with Matt’s team, we:
Integrated Prometheus, Grafana, and OpenTelemetry to improve observability and system monitoring. This now provides real-time insights into system health and performance metrics
Used Google Cloud Platform (GCP) alerts to proactively detect anomalies, allowing the team to respond quickly to potential issues and protect the users’ experience
The results: 75% faster query response times after migration and 40% cost reduction
Equinix, Secretly Nice, and NaNLABS worked together in sync to make this sales enablement tool part of its core business and public offering.
This collaboration allowed Equinix to benefit from the following enhancements:
A stabilized, bug-free SaaS platform
Average response time to bug resolution: 1 day
Increased test coverage from 20% to 80%
Improved sales team productivity
75% faster query response time after the MongoDB migration
3x increase in throughput, allowing the system to handle significant concurrent users
40% reduction in infrastructure costs, thanks to GCP optimizations
30% additional cost reduction by using GCP Reserved Instances
Minimal downtime, with 97,3% of total uptime
But the most significant outcome was the creation of a new revenue stream for Equinix by enabling the platform to be monetized as a SaaS solution.
Need to reduce your data infrastructure costs?
Matt chose to partner with NaNLABS because of our expertise in optimizing cloud data solutions and real-time data processing.
Transforming Equinix’s internal tool into a scalable SaaS platform was a challenging task. From stabilizing the codebase to adopting a multi-tenancy architecture, optimizing data processing, and reducing infrastructure costs, we made the platform ready for enterprise-scale adoption.
The result? A high-performing, bug-free solution with 75% faster query response times and 40% savings on infrastructure costs.
If reducing your data infrastructure costs while improving performance and scalability is a priority, NaNLABS can help!
At NaNLABS, we specialize in cloud-native solutions, designing scalable infrastructures, and optimizing real-time data processing pipelines.
Let’s reduce your infrastructure costs while optimizing performance.
Editorial contributions: Camila Mirabal