Welcome!

Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: @DXWorldExpo, Agile Computing, @CloudExpo

@DXWorldExpo: Blog Post

How ECommerce Sites Harvest Big Data Across Multiple Clouds By @Dana_Gardner | @BigDataExpo #BigData

Consultants Help Harness Power of Big Data

The next BriefingsDirect big data innovation thought leadership interview highlights how a consultant helps large ecommerce organizations better manage their big data architectures across cloud environments.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.

 

To learn more about how big data is best architected for the largest web applications, BriefingsDirect sat down with Jimmy Mohsin, Principal Software Architect at Norjimm LLC, a consultancy based in Princeton, New Jersey. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: How are large web applications deciding on the right big data architecture?

Mohsin: There's a lot of interest in trying to deal with large data volumes, not only large data volumes, but also data that changes rapidly. Now, there are many companies that have very large datasets, some in terabytes, some in petabytes and then they're getting live feeds.

The data is there and it’s changing rapidly. The traditional databases sometimes can’t handle that problem, especially if you're using that database as a warehouse and you're reporting against it.

Basically, we have kind of a moving-target situation. With HP Vertica, what we've seen is the ability to solve that problem in at least some of the cases that I've come across, and I can talk about specific use cases in that regard.

Input/output issues

Gardner: Before we get into a specific use case, I'm interested particularly in some of these input/output issues. People are trying to decide how to move the data around. They're toying with cloud. They're trying to bring data for more types of traditional repositories. And, as you say, they're facing new types of data problems with streaming and real-time feeds.

How do you see them beginning this process when they have to handle so many variables? Is it something that’s an IT architecture, or enterprise architecture, or data architecture? Who's responsible for this, given that it’s now a rather holistic problem?

Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition

Mohsin: In my present project, we ran into that. The problem is that many companies don't even have a well defined data-architecture team. Some of them do. You'll find a lot of companies with an enterprise-architect role and you'll have some companies with a haphazard definition of an architectural group.

Mohsin

Net-net, at least at this point, unless companies are more structured, it becomes a management issue in the sense that someone at the leadership level needs to know who has what domain knowledge and then form the appropriate team to skin this cat.

I know of a recent situation where we had to build a team of four people, and only one was an architect. But we built a virtual team of four people who were able to assemble and collate all the repositories that spanned 15 years and four different technology flavors, and then come up with an approach that resulted in a single repository in HP Vertica.

So there are no easy answers yet, because organizations just aren't uniformly structured.

Gardner: Well, I imagine they'll be adapting, just like we all are, to the new realities. In the meantime, tell me about a specific use case that demonstrates the intensity of scale and velocity, and how at least one architecture has been deployed to manage that?

Mohsin: One of my present projects deals with one of the world's largest retailers. It's eCommerce, online selling. One of the things they do, in addition to their transactions of buying and selling, is email campaign management. That means staying in touch with the customer on the basis of their purchases, their interests, and their profiles.

One of the things we do is see what a certain customer’s buying preferences have been over the past 90 days. Knowing that and the customer’s profile, we can try to predict what their buying patterns will be. So we send them a very tailored message in that regard. In this project, we're dealing with about 150 to 160 million emails a day. So this is definitely big data.

Here we have online information coming into one warehouse as to what's happening in the world of buying and selling. Then, behind the scenes, while that information is being sent to the warehouse, we're trying to do these email campaigns.

This is where the problem becomes fairly complicated. We tried traditional relational database management systems (RDBMS), and they kind of worked, but we ran into a slew of speed and performance issues. That's really where the big-data world was really beneficial. We were able to address that problem in about a seven-month project that we ran.

Gardner: And this was using HP Vertica?

Large organization

Mohsin: We did an evaluation. We looked at a few databases, and the corporate choice was Vertica. We saw that there is a whole bunch of big-data vendors. The issue is that many of the vendors don't have any large organizations behind them, and Vertica does. The company management felt that this was a new big database, but HP was behind it, and the fact that they also use HP hardware helped a lot.

They chose Vertica. The team I was managing did a proof of concept (POC) and we were able to demonstrate that Vertica would be able to handle the reporting that is tied to the email campaign management. We ran a 90 day POC, and the results were so positive that there was an interest in going live. We went live in about another 90 days, following a 90-day POC.

Gardner: I understand that Vertica is quite versatile. I've heard of a number of ways in which it's used technically. But this email campaign problem almost sounds like a transactional issue, a complex event processing issue, or a transfer agent scaling issue. How does big data, Vertica, and analytics come to bear on this particular problem?

Mohsin: It's exactly what you say it is. As we are reporting and pushing out the campaigns, new information is coming in every half hour, sometimes even more frequently. There's a live feed that's updating the warehouse. While the warehouse is being updated, we want to report against it in real time and keep our campaigns going.

Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition

The key point is that we can't really stop any of these processes. The customers who are managing the campaigns want to see information very frequently. We can’t even predict when they would want their information. At the same time, the transactional systems are sending us live feeds.

The problem we ran into with the traditional RDBMS is that the reporting didn't function when the live feeds were underway. We couldn't run our back-end email campaign reports when new data was coming in.

 

One of the benefits Vertica has, due to its basic architecture and its columnar design is that it's better positioned to do that. This is what we were able to demonstrate in the live POC, and nobody was going to take our word for it.

The end user said, "Take few of our largest clients. Take some of our clients that have a lot of transactions. Prove that the reports will work for those clients." That's what we did in 30 days. Then, we extended it, and then in 90 days, we demonstrated the whole thing end to end. Following that was the go-live.

Gardner: You had to solve that problem of the live feeds, the rapidity of information. Rather going to a stop, batch process, analyze, repeat, you've gained a solution to your problem.

But at the same time, it seems like you're getting data into an environment where you can analyze it and perhaps extract other forms of analysis, in addition to solving your email, eCommerce trajectory issues. It seems to me that you're now going to have the opportunity to add a new dimension of analysis to what's going on and perhaps we find these transactions more toward a customer inference benefit.

More than a database

 

Mohsin: One of the things internally that I like to say is that Vertica isn't just a big database, it’s more than just a database. It's really a platform, because you have distributed all, you are publishing other tools. When we adopted it and went live with this technology, we first solved the feeds and speeds problem, but now we're very much positioned to use some of the capabilities that exist in Vertica.

We had Distributed R being one of them, Inference Analysis being another one, so that we can build intelligent reports. To date, we've been building those outside the RDBMS. RDBMS has no role in that. With Vertica, I call it more of a data platform. So we definitely will go there, but that would be our second phase.

As the system starts to function and deliver on the key use cases, the next stage would be to build more sophisticated reports. We definitely have the requirements and now we have the ability to deliver.

Gardner: Perhaps you could add visualization capabilities to that. You could make a data pool available to more of the constituents within this organization so that they could innovate and do experiments. That’s a very powerful stuff indeed.

Is there anything else you can tell us for other organizations that might be facing similar issues around real-time feeds and the need to analyze and react, now that you have been through this on this particular project. Are there any lessons learned for others.

One of the issues in big data at least today is that you can’t find a whole slew of clients who have already gone live and who are in production.

If you're facing transactional issues and you haven't thought about a big-data platform as part of that solution, what do you offer to them in terms of maybe lighting a light bulb in their mind about looking for alternatives to traditional middleware.

Mohsin: Like so many people try to do, we tried to see if anyone else had done this. One of the issues in big data at least today is that you can’t find a whole slew of clients who have already gone live and who are in production.

There are lots of people in development, and some are live, but in our space, we couldn't find anyone who was live. We solved that issue via a quick-hit POC. The big lesson there was that we scoped the POC right. We didn’t want to do too much and we didn’t want to do too little. So that was a good lesson learned.

The other big thing is the data-migration question. Maybe, to some extent, this problem will never be solved. It's not so easy to pull data out of legacy database systems. Very few of them will give you good tools to migrate away from them. They all want you to stay. So we had to write our own tooling. We scoured the market for it, but we couldn’t find too many options out there.

Understand your data

So a huge lesson learned was, if you really want to do this, if you want to move to big data, get a handle on understanding your data. Make sure you have the domain experts in-house. Make sure you have the tooling in place, however rudimentary it might be, to be able to pull the data out of your existing database. Once you have it in the file system, Vertica can take it in minutes. That’s not the problem. The problem is getting it out.

Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition

We continue to grapple with that and we have made product enhancement recommendations. But in fairness to Vertica, this is really not something that Vertica can do much about, because this is more in the legacy database space.

Gardner: I've heard quite a few people say that, given the velocity with which they are seeing people move to the cloud, that obviously isn't part of their problem, as the data is already in the cloud. It's in the standardized architecture that that cloud is built around, if there is a platform-as-a-service (PaaS) capability, then getting at the data isn't so much of a problem, or am I not reading that correctly?

There is still a lingering fear of the cloud. People will tell you that the cloud is not secure.

Mohsin: No, you're reading that correctly. The problem we have is that a lot of companies are still not in the cloud. There is still a lingering fear of the cloud. People will tell you that the cloud is not secure. If you have customer information, if you have personalized data, many organizations don't want to put it in the cloud.

Slowly, they are moving in that direction. If we were all there, I would completely agree with you, but since we still have so many on-premise deployments, we're still in a hybrid mode -- some is on-prem, some is in the cloud.

Gardner: I just bring it up because it gives yet another reason to seriously consider cloud. It’s a benefit that is actually quite powerful -- the data access and ability to do joins and bring datasets together because they're all in the same cloud.

Mohsin: I fundamentally agree with you. I fundamentally believe in the cloud and that it really should be the way to go. Going through our very recent go-live, there is no way we could have the same elasticity in an on-prem is deployment that we can have in a cloud. I can pick up the phone, call a cloud provider, and have another machine the next day. I can't do that if it’s on-premise.

Again, a simple question of moving all the assets into the cloud, at least in some organizations, will take several months, if not years.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP Enterprise.

You may also be interested in:

More Stories By Dana Gardner

At Interarbor Solutions, we create the analysis and in-depth podcasts on enterprise software and cloud trends that help fuel the social media revolution. As a veteran IT analyst, Dana Gardner moderates discussions and interviews get to the meat of the hottest technology topics. We define and forecast the business productivity effects of enterprise infrastructure, SOA and cloud advances. Our social media vehicles become conversational platforms, powerfully distributed via the BriefingsDirect Network of online media partners like ZDNet and IT-Director.com. As founder and principal analyst at Interarbor Solutions, Dana Gardner created BriefingsDirect to give online readers and listeners in-depth and direct access to the brightest thought leaders on IT. Our twice-monthly BriefingsDirect Analyst Insights Edition podcasts examine the latest IT news with a panel of analysts and guests. Our sponsored discussions provide a unique, deep-dive focus on specific industry problems and the latest solutions. This podcast equivalent of an analyst briefing session -- made available as a podcast/transcript/blog to any interested viewer and search engine seeker -- breaks the mold on closed knowledge. These informational podcasts jump-start conversational evangelism, drive traffic to lead generation campaigns, and produce strong SEO returns. Interarbor Solutions provides fresh and creative thinking on IT, SOA, cloud and social media strategies based on the power of thoughtful content, made freely and easily available to proactive seekers of insights and information. As a result, marketers and branding professionals can communicate inexpensively with self-qualifiying readers/listeners in discreet market segments. BriefingsDirect podcasts hosted by Dana Gardner: Full turnkey planning, moderatiing, producing, hosting, and distribution via blogs and IT media partners of essential IT knowledge and understanding.

@ThingsExpo Stories
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
I think DevOps is now a rambunctious teenager - it's starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Recently, WebRTC has a lot of eyes from market. The use cases of WebRTC are expanding - video chat, online education, online health care etc. Not only for human-to-human communication, but also IoT use cases such as machine to human use cases can be seen recently. One of the typical use-case is remote camera monitoring. With WebRTC, people can have interoperability and flexibility for deploying monitoring service. However, the benefit of WebRTC for IoT is not only its convenience and interopera...
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...
WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, introduced two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a multip...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, discussed the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
Detecting internal user threats in the Big Data eco-system is challenging and cumbersome. Many organizations monitor internal usage of the Big Data eco-system using a set of alerts. This is not a scalable process given the increase in the number of alerts with the accelerating growth in data volume and user base. Organizations are increasingly leveraging machine learning to monitor only those data elements that are sensitive and critical, autonomously establish monitoring policies, and to detect...
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settl...
In his session at @ThingsExpo, Dr. Robert Cohen, an economist and senior fellow at the Economic Strategy Institute, presented the findings of a series of six detailed case studies of how large corporations are implementing IoT. The session explored how IoT has improved their economic performance, had major impacts on business models and resulted in impressive ROIs. The companies covered span manufacturing and services firms. He also explored servicification, how manufacturing firms shift from se...
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of bus...
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
IoT solutions exploit operational data generated by Internet-connected smart “things” for the purpose of gaining operational insight and producing “better outcomes” (for example, create new business models, eliminate unscheduled maintenance, etc.). The explosive proliferation of IoT solutions will result in an exponential growth in the volume of IoT data, precipitating significant Information Governance issues: who owns the IoT data, what are the rights/duties of IoT solutions adopters towards t...
Amazon started as an online bookseller 20 years ago. Since then, it has evolved into a technology juggernaut that has disrupted multiple markets and industries and touches many aspects of our lives. It is a relentless technology and business model innovator driving disruption throughout numerous ecosystems. Amazon’s AWS revenues alone are approaching $16B a year making it one of the largest IT companies in the world. With dominant offerings in Cloud, IoT, eCommerce, Big Data, AI, Digital Assista...
Organizations planning enterprise data center consolidation and modernization projects are faced with a challenging, costly reality. Requirements to deploy modern, cloud-native applications simultaneously with traditional client/server applications are almost impossible to achieve with hardware-centric enterprise infrastructure. Compute and network infrastructure are fast moving down a software-defined path, but storage has been a laggard. Until now.
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...