Welcome!

Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: @BigDataExpo, Agile Computing, @CloudExpo

@BigDataExpo: Blog Post

How ECommerce Sites Harvest Big Data Across Multiple Clouds By @Dana_Gardner | @BigDataExpo #BigData

Consultants Help Harness Power of Big Data

The next BriefingsDirect big data innovation thought leadership interview highlights how a consultant helps large ecommerce organizations better manage their big data architectures across cloud environments.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.

 

To learn more about how big data is best architected for the largest web applications, BriefingsDirect sat down with Jimmy Mohsin, Principal Software Architect at Norjimm LLC, a consultancy based in Princeton, New Jersey. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: How are large web applications deciding on the right big data architecture?

Mohsin: There's a lot of interest in trying to deal with large data volumes, not only large data volumes, but also data that changes rapidly. Now, there are many companies that have very large datasets, some in terabytes, some in petabytes and then they're getting live feeds.

The data is there and it’s changing rapidly. The traditional databases sometimes can’t handle that problem, especially if you're using that database as a warehouse and you're reporting against it.

Basically, we have kind of a moving-target situation. With HP Vertica, what we've seen is the ability to solve that problem in at least some of the cases that I've come across, and I can talk about specific use cases in that regard.

Input/output issues

Gardner: Before we get into a specific use case, I'm interested particularly in some of these input/output issues. People are trying to decide how to move the data around. They're toying with cloud. They're trying to bring data for more types of traditional repositories. And, as you say, they're facing new types of data problems with streaming and real-time feeds.

How do you see them beginning this process when they have to handle so many variables? Is it something that’s an IT architecture, or enterprise architecture, or data architecture? Who's responsible for this, given that it’s now a rather holistic problem?

Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition

Mohsin: In my present project, we ran into that. The problem is that many companies don't even have a well defined data-architecture team. Some of them do. You'll find a lot of companies with an enterprise-architect role and you'll have some companies with a haphazard definition of an architectural group.

Mohsin

Net-net, at least at this point, unless companies are more structured, it becomes a management issue in the sense that someone at the leadership level needs to know who has what domain knowledge and then form the appropriate team to skin this cat.

I know of a recent situation where we had to build a team of four people, and only one was an architect. But we built a virtual team of four people who were able to assemble and collate all the repositories that spanned 15 years and four different technology flavors, and then come up with an approach that resulted in a single repository in HP Vertica.

So there are no easy answers yet, because organizations just aren't uniformly structured.

Gardner: Well, I imagine they'll be adapting, just like we all are, to the new realities. In the meantime, tell me about a specific use case that demonstrates the intensity of scale and velocity, and how at least one architecture has been deployed to manage that?

Mohsin: One of my present projects deals with one of the world's largest retailers. It's eCommerce, online selling. One of the things they do, in addition to their transactions of buying and selling, is email campaign management. That means staying in touch with the customer on the basis of their purchases, their interests, and their profiles.

One of the things we do is see what a certain customer’s buying preferences have been over the past 90 days. Knowing that and the customer’s profile, we can try to predict what their buying patterns will be. So we send them a very tailored message in that regard. In this project, we're dealing with about 150 to 160 million emails a day. So this is definitely big data.

Here we have online information coming into one warehouse as to what's happening in the world of buying and selling. Then, behind the scenes, while that information is being sent to the warehouse, we're trying to do these email campaigns.

This is where the problem becomes fairly complicated. We tried traditional relational database management systems (RDBMS), and they kind of worked, but we ran into a slew of speed and performance issues. That's really where the big-data world was really beneficial. We were able to address that problem in about a seven-month project that we ran.

Gardner: And this was using HP Vertica?

Large organization

Mohsin: We did an evaluation. We looked at a few databases, and the corporate choice was Vertica. We saw that there is a whole bunch of big-data vendors. The issue is that many of the vendors don't have any large organizations behind them, and Vertica does. The company management felt that this was a new big database, but HP was behind it, and the fact that they also use HP hardware helped a lot.

They chose Vertica. The team I was managing did a proof of concept (POC) and we were able to demonstrate that Vertica would be able to handle the reporting that is tied to the email campaign management. We ran a 90 day POC, and the results were so positive that there was an interest in going live. We went live in about another 90 days, following a 90-day POC.

Gardner: I understand that Vertica is quite versatile. I've heard of a number of ways in which it's used technically. But this email campaign problem almost sounds like a transactional issue, a complex event processing issue, or a transfer agent scaling issue. How does big data, Vertica, and analytics come to bear on this particular problem?

Mohsin: It's exactly what you say it is. As we are reporting and pushing out the campaigns, new information is coming in every half hour, sometimes even more frequently. There's a live feed that's updating the warehouse. While the warehouse is being updated, we want to report against it in real time and keep our campaigns going.

Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition

The key point is that we can't really stop any of these processes. The customers who are managing the campaigns want to see information very frequently. We can’t even predict when they would want their information. At the same time, the transactional systems are sending us live feeds.

The problem we ran into with the traditional RDBMS is that the reporting didn't function when the live feeds were underway. We couldn't run our back-end email campaign reports when new data was coming in.

 

One of the benefits Vertica has, due to its basic architecture and its columnar design is that it's better positioned to do that. This is what we were able to demonstrate in the live POC, and nobody was going to take our word for it.

The end user said, "Take few of our largest clients. Take some of our clients that have a lot of transactions. Prove that the reports will work for those clients." That's what we did in 30 days. Then, we extended it, and then in 90 days, we demonstrated the whole thing end to end. Following that was the go-live.

Gardner: You had to solve that problem of the live feeds, the rapidity of information. Rather going to a stop, batch process, analyze, repeat, you've gained a solution to your problem.

But at the same time, it seems like you're getting data into an environment where you can analyze it and perhaps extract other forms of analysis, in addition to solving your email, eCommerce trajectory issues. It seems to me that you're now going to have the opportunity to add a new dimension of analysis to what's going on and perhaps we find these transactions more toward a customer inference benefit.

More than a database

 

Mohsin: One of the things internally that I like to say is that Vertica isn't just a big database, it’s more than just a database. It's really a platform, because you have distributed all, you are publishing other tools. When we adopted it and went live with this technology, we first solved the feeds and speeds problem, but now we're very much positioned to use some of the capabilities that exist in Vertica.

We had Distributed R being one of them, Inference Analysis being another one, so that we can build intelligent reports. To date, we've been building those outside the RDBMS. RDBMS has no role in that. With Vertica, I call it more of a data platform. So we definitely will go there, but that would be our second phase.

As the system starts to function and deliver on the key use cases, the next stage would be to build more sophisticated reports. We definitely have the requirements and now we have the ability to deliver.

Gardner: Perhaps you could add visualization capabilities to that. You could make a data pool available to more of the constituents within this organization so that they could innovate and do experiments. That’s a very powerful stuff indeed.

Is there anything else you can tell us for other organizations that might be facing similar issues around real-time feeds and the need to analyze and react, now that you have been through this on this particular project. Are there any lessons learned for others.

One of the issues in big data at least today is that you can’t find a whole slew of clients who have already gone live and who are in production.

If you're facing transactional issues and you haven't thought about a big-data platform as part of that solution, what do you offer to them in terms of maybe lighting a light bulb in their mind about looking for alternatives to traditional middleware.

Mohsin: Like so many people try to do, we tried to see if anyone else had done this. One of the issues in big data at least today is that you can’t find a whole slew of clients who have already gone live and who are in production.

There are lots of people in development, and some are live, but in our space, we couldn't find anyone who was live. We solved that issue via a quick-hit POC. The big lesson there was that we scoped the POC right. We didn’t want to do too much and we didn’t want to do too little. So that was a good lesson learned.

The other big thing is the data-migration question. Maybe, to some extent, this problem will never be solved. It's not so easy to pull data out of legacy database systems. Very few of them will give you good tools to migrate away from them. They all want you to stay. So we had to write our own tooling. We scoured the market for it, but we couldn’t find too many options out there.

Understand your data

So a huge lesson learned was, if you really want to do this, if you want to move to big data, get a handle on understanding your data. Make sure you have the domain experts in-house. Make sure you have the tooling in place, however rudimentary it might be, to be able to pull the data out of your existing database. Once you have it in the file system, Vertica can take it in minutes. That’s not the problem. The problem is getting it out.

Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition

We continue to grapple with that and we have made product enhancement recommendations. But in fairness to Vertica, this is really not something that Vertica can do much about, because this is more in the legacy database space.

Gardner: I've heard quite a few people say that, given the velocity with which they are seeing people move to the cloud, that obviously isn't part of their problem, as the data is already in the cloud. It's in the standardized architecture that that cloud is built around, if there is a platform-as-a-service (PaaS) capability, then getting at the data isn't so much of a problem, or am I not reading that correctly?

There is still a lingering fear of the cloud. People will tell you that the cloud is not secure.

Mohsin: No, you're reading that correctly. The problem we have is that a lot of companies are still not in the cloud. There is still a lingering fear of the cloud. People will tell you that the cloud is not secure. If you have customer information, if you have personalized data, many organizations don't want to put it in the cloud.

Slowly, they are moving in that direction. If we were all there, I would completely agree with you, but since we still have so many on-premise deployments, we're still in a hybrid mode -- some is on-prem, some is in the cloud.

Gardner: I just bring it up because it gives yet another reason to seriously consider cloud. It’s a benefit that is actually quite powerful -- the data access and ability to do joins and bring datasets together because they're all in the same cloud.

Mohsin: I fundamentally agree with you. I fundamentally believe in the cloud and that it really should be the way to go. Going through our very recent go-live, there is no way we could have the same elasticity in an on-prem is deployment that we can have in a cloud. I can pick up the phone, call a cloud provider, and have another machine the next day. I can't do that if it’s on-premise.

Again, a simple question of moving all the assets into the cloud, at least in some organizations, will take several months, if not years.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP Enterprise.

You may also be interested in:

More Stories By Dana Gardner

At Interarbor Solutions, we create the analysis and in-depth podcasts on enterprise software and cloud trends that help fuel the social media revolution. As a veteran IT analyst, Dana Gardner moderates discussions and interviews get to the meat of the hottest technology topics. We define and forecast the business productivity effects of enterprise infrastructure, SOA and cloud advances. Our social media vehicles become conversational platforms, powerfully distributed via the BriefingsDirect Network of online media partners like ZDNet and IT-Director.com. As founder and principal analyst at Interarbor Solutions, Dana Gardner created BriefingsDirect to give online readers and listeners in-depth and direct access to the brightest thought leaders on IT. Our twice-monthly BriefingsDirect Analyst Insights Edition podcasts examine the latest IT news with a panel of analysts and guests. Our sponsored discussions provide a unique, deep-dive focus on specific industry problems and the latest solutions. This podcast equivalent of an analyst briefing session -- made available as a podcast/transcript/blog to any interested viewer and search engine seeker -- breaks the mold on closed knowledge. These informational podcasts jump-start conversational evangelism, drive traffic to lead generation campaigns, and produce strong SEO returns. Interarbor Solutions provides fresh and creative thinking on IT, SOA, cloud and social media strategies based on the power of thoughtful content, made freely and easily available to proactive seekers of insights and information. As a result, marketers and branding professionals can communicate inexpensively with self-qualifiying readers/listeners in discreet market segments. BriefingsDirect podcasts hosted by Dana Gardner: Full turnkey planning, moderatiing, producing, hosting, and distribution via blogs and IT media partners of essential IT knowledge and understanding.

@ThingsExpo Stories
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software. They hope to capture value from emerging technologies such as IoT, SDN, and AI. Ultimately, irrespective of the vertical, it is about deriving value from independent software applications participating in an ecosystem as one comprehensive solution. In his session at @ThingsExpo, Kausik Sridhar, founder and CTO of Pulzze Systems, will discuss how given the magnitude of today's applicati...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
Digital transformation is changing the face of business. The IDC predicts that enterprises will commit to a massive new scale of digital transformation, to stake out leadership positions in the "digital transformation economy." Accordingly, attendees at the upcoming Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA, Oct 31-Nov 2, will find fresh new content in a new track called Enterprise Cloud & Digital Transformation.
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp emp...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
As popularity of the smart home is growing and continues to go mainstream, technological factors play a greater role. The IoT protocol houses the interoperability battery consumption, security, and configuration of a smart home device, and it can be difficult for companies to choose the right kind for their product. For both DIY and professionally installed smart homes, developers need to consider each of these elements for their product to be successful in the market and current smart homes.
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant th...
SYS-CON Events announced today that Avere Systems, a leading provider of hybrid cloud enablement solutions, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere Systems was created by file systems experts determined to reinvent storage by changing the way enterprises thought about and bought storage resources. With decades of experience behind the company’s founders, Avere got its ...
SYS-CON Events announced today that Golden Gate University will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Since 1901, non-profit Golden Gate University (GGU) has been helping adults achieve their professional goals by providing high quality, practice-based undergraduate and graduate educational programs in law, taxation, business and related professions. Many of its courses are taug...
SYS-CON Events announced today that SIGMA Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. uLaser flow inspection device from the Japanese top share to Global Standard! Then, make the best use of data to flip to next page. For more information, visit http://www.sigma-k.co.jp/en/.
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, will discuss how by using...
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lead...
SYS-CON Events announced today that CAST Software will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CAST was founded more than 25 years ago to make the invisible visible. Built around the idea that even the best analytics on the market still leave blind spots for technical teams looking to deliver better software and prevent outages, CAST provides the software intelligence that matter ...
SYS-CON Events announced today that Daiya Industry will exhibit at the Japanese Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ruby Development Inc. builds new services in short period of time and provides a continuous support of those services based on Ruby on Rails. For more information, please visit https://github.com/RubyDevInc.
As businesses evolve, they need technology that is simple to help them succeed today and flexible enough to help them build for tomorrow. Chrome is fit for the workplace of the future — providing a secure, consistent user experience across a range of devices that can be used anywhere. In her session at 21st Cloud Expo, Vidya Nagarajan, a Senior Product Manager at Google, will take a look at various options as to how ChromeOS can be leveraged to interact with people on the devices, and formats th...
SYS-CON Events announced today that Yuasa System will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Yuasa System is introducing a multi-purpose endurance testing system for flexible displays, OLED devices, flexible substrates, flat cables, and films in smartphones, wearables, automobiles, and healthcare.
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Taica manufacturers Alpha-GEL brand silicone components and materials, which maintain outstanding performance over a wide temperature range -40C to +200C. For more information, visit http://www.taica.co.jp/english/.
SYS-CON Events announced today that SourceForge has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SourceForge is the largest, most trusted destination for Open Source Software development, collaboration, discovery and download on the web serving over 32 million viewers, 150 million downloads and over 460,000 active development projects each and every month.
SYS-CON Events announced today that Nihon Micron will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Nihon Micron Co., Ltd. strives for technological innovation to establish high-density, high-precision processing technology for providing printed circuit board and metal mount RFID tags used for communication devices. For more inf...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities – ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups. As a result, many firms employ new business models that place enormous impor...