Welcome!

Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Blog Feed Post

Why Contextual Data Locality Matters

Big Data is quickly overtaking SDN as a key phrase in today’s networking lingo. And overused already as it may be, it actually has a lot more meaning and definition compared to SDN. Big Data solutions are designed to work on lots of data as the name suggests. Of course they have been around forever, talk to any large bank, credit card company, airline or logistics company and all of them have had applications running on extremely large databases and data sets forever. But this is the new Big Data, the one inspired by Hadoop, MapReduce and friends. High performance compute clusters specifically created to analyze large amounts of data and reduce it to a form and quantity that human brains can use in decision making.

What makes today’s Big Data solutions different than its more traditional large database based applications, beyond the sheer datasets being analyzed, is the distributed nature of the analysis. Big Data solutions are designed to run across 100s or even 1000s of servers, each with multiple CPU cores to chew on the data. Traditional large database applications tend to be more localized with fewer applications and servers accessing the data, allowing for more tightly custom integrated solutions, the likes of which Oracle and friends are experts at.

Big Data Flashback

In the late 80s I started my career working as a network engineer for a high energy physics research institute. Working closely with the folks at CERN in Geneva, these physicists were (at the time, and probably still) masters of creating very large datasets. Every time an experiment was run, Tbytes of data (probably Pbytes by now) were generated by thousands of sensors along the tunnel or ring particles were passed through to collide.

The Big Data solution at the time was primitive, but not all that much different than today. The large datasets were manually broken into manageable pieces, something that would fit on a tape or disk. These datasets were then hand copied onto a compute server or super computer and the analysis application would churn through it to find specific data, correlate events and simply reduce the data to something smaller and meaningful. This would then create a new dataset, which would be combined, chopped up again, and the process repeated itself until they arrived at data that was consumable for humans to create new theories from, or provide a piece of proof of an existing theory.

During that first job, the IT group spend an enormous amount of time moving data around. A lot of it manual: tapes and disks were constantly being copied onto the appropriate compute server. The data had to be local to have any chance of analyzing the data. Between tapes, local disks and the network, the local disks were the only storage with appropriate speed to have a hope of finalizing the data reductions. And even then it would not be unusual to have a rather powerful (for the time) Apollo workstation run for several weeks on a single data set.

Back to the here and now

Forward the clock to now. The above description is really not that different from how Hadoop MapReduce works. Start with a big data set, chop it into pieces, replicate the data, compute on the data close to physical locality of the data. Then send results to Reducers, combine the results, then perhaps repeat again to get to human interpretable results.

As fast as we believe the network is within 10GbE access ports, it is still commonly the most restrictive component in the compute, distributed storage and network trio. Compute power increments have far outpaced network speed increments and even memory speed increments. We have many more cycles available to compute, but have not been able to get the data into these CPUs with the same increments. As a result, storage solutions are becoming increasingly distributed, closer to the compute power that needs it.

It’s a natural thought to have the data close to where it needs to be processed, close enough that the effort of retrieving it does not impact the overall completion of the task that uses that data. If I am writing a research paper that takes several hours to complete, I do not mind having to wait a second here or there for the right web sites to load. I would mind if I had to get into my car and drive to the library to look something up, drive back home to work on my paper, and keep doing that. The relationship between time and effort to get data has to become negligible compared to the time and effort required to complete the task.

Locality and growth

This type of contextual locality is extremely hard to manage in a dynamic and growing environment. How do you make sure that the right data remains contextually close to where it is needed when servers and VMs may not be physically close? They may not be in the same rack for the same application or customer, they may not even be in the same pod or datacenter. Storage is relatively cheap, but replication for closeness can very quickly lead to a data distribution complexity that is unmanageable in environments where its not a single orchestrated big data solution.

To solve this problem you need help from your network. You need to be able to create locality on the fly. Things that are not physically close need to be made virtually close, but with the characteristics of physical locality. And in network terms these are of course measured in the usual staples of latency and bandwidth. This is when you want to articulate relationships between the data and the applications that need that data and create virtual closeness that resembles the physical. This may mean dedicated paths through multiple switches to avoid congestion that will dramatically impact latency. These same paths can provide direct physical connectivity through dynamically engineered optical paths between application and storage, or simply appropriate prioritization of traffic along these paths. Without having to worry explicitly where the application is or where the storage is.

Physics will always stand in the way of what we really want or need, but that does not mean we use that same physics with a bit of math to create solutions that manage the complexity of creating dynamic locality. Locality is important. More pronounced in Big Data solutions, but even at a smaller scale it is important within the context of the compute effort on that data.

[Today's fun fact: Lake Superior is the world's largest lake. With that kind of naming accuracy we would like to hire the person that named the lake as our VP of Naming and Terminology]

The post Why Contextual Data Locality Matters appeared first on Plexxi.

Read the original blog entry...

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

@ThingsExpo Stories
"We provide IoT solutions. We provide the most compatible solutions for many applications. Our solutions are industry agnostic and also protocol agnostic," explained Richard Han, Head of Sales and Marketing and Engineering at Systena America, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We've been engaging with a lot of customers including Panasonic, we've been involved with Cisco and now we're working with the U.S. government - the Department of Homeland Security," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Calligo, an innovative cloud service provider offering mid-sized companies the highest levels of data privacy and security, has been named "Bronze Sponsor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalised support service from its globally located cloud plat...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
"The Striim platform is a full end-to-end streaming integration and analytics platform that is middleware that covers a lot of different use cases," explained Steve Wilkes, Founder and CTO at Striim, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
"We are focused on SAP running in the clouds, to make this super easy because we believe in the tremendous value of those powerful worlds - SAP and the cloud," explained Frank Stienhans, CTO of Ocean9, Inc., in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
DX World EXPO, LLC., a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
"MobiDev is a Ukraine-based software development company. We do mobile development, and we're specialists in that. But we do full stack software development for entrepreneurs, for emerging companies, and for enterprise ventures," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
While the focus and objectives of IoT initiatives are many and diverse, they all share a few common attributes, and one of those is the network. Commonly, that network includes the Internet, over which there isn't any real control for performance and availability. Or is there? The current state of the art for Big Data analytics, as applied to network telemetry, offers new opportunities for improving and assuring operational integrity. In his session at @ThingsExpo, Jim Frey, Vice President of S...
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
In his opening keynote at 20th Cloud Expo, Michael Maximilien, Research Scientist, Architect, and Engineer at IBM, discussed the full potential of the cloud and social data requires artificial intelligence. By mixing Cloud Foundry and the rich set of Watson services, IBM's Bluemix is the best cloud operating system for enterprises today, providing rapid development and deployment of applications that can take advantage of the rich catalog of Watson services to help drive insights from the vast t...
SYS-CON Events announced today that EnterpriseTech has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. EnterpriseTech is a professional resource for news and intelligence covering the migration of high-end technologies into the enterprise and business-IT industry, with a special focus on high-tech solutions in new product development, workload management, increased effic...
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
SYS-CON Events announced today that Cloud Academy named "Bronze Sponsor" of 21st International Cloud Expo which will take place October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud com...
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and ...
SYS-CON Events announced today that CHEETAH Training & Innovation will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CHEETAH Training & Innovation is a cloud consulting and IT training firm specializing in improving clients cloud strategies and infrastructures for medium to large companies.
SYS-CON Events announced today that Datanami has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datanami is a communication channel dedicated to providing insight, analysis and up-to-the-minute information about emerging trends and solutions in Big Data. The publication sheds light on all cutting-edge technologies including networking, storage and applications, and thei...
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...