Welcome!

Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: @DXWorldExpo, @CloudExpo, Apache

@DXWorldExpo: Article

Analytics in Decision-Making Workflow | @CloudExpo #BigData #Microservices

Big Data shouldn’t be restricted to data scientists

Putting Analytics into the Decision-Making Workflow with Apache Spark

Data-driven businesses use analytics to inform and support their decisions. In many companies, marketing, sales, finance, and operations departments tend to be the earliest adopters of data analytics, with the rest of the business lagging behind. The goal for many organizations now is to make analytics a natural part of most-if not every-employee's daily workflow. Achieving that objective typically requires a shift in the corporate culture, and ready access to user-friendly data analytics tools.

Big Data Shouldn't Be Restricted to Data Scientists
Big Data experts, when discussing the process of integrating data analysis into the workflow across an enterprise, often talk blithely about how users can easily leverage their SQL skills to query data. The problem is that not everyone has SQL skills-or even knows what SQL is.

Companies who plan to transform themselves into data-driven, lean businesses may want to consider the fact that every employee really doesn't need to be a data scientist. Focus the majority of training efforts (including how to run basic SQL queries, if necessary) on the employees whose jobs involve fact-based decision-making.

Making employees wait for IT to manage schemas and setup ETL tasks is counter-productive. In a busy company, by the time data is prepped for analysis, it may have lost some of its actionable relevance. Instead, provide robust self-service data analysis tools, such as Apache Drill, to enable users to extract the most value possible from data stored in Hadoop. This frees employees to work with data in native formats-schema-less data, nested data, and data with rapidly-evolving schemas-with limited to no IT involvement.

Self-service data tools also enable explorative queries. Users can explore the data directly and extend their analysis effortlessly, with no need to wait for IT to prep additional data sets. Analysis can then extend past known, structured data, to semi-structured and unstructured data, such as call center logs, videos, spreadsheets, social media data, clickstream data, web log files, and external data (such as publicly available industry data)-allowing a business to gain big picture, actionable insights on the fly.

Apache Spark: Bringing New Efficiencies to Big Data Analysis
Agile companies that rely on data analysis performed in near-time and real-time also need solutions that can rapidly process large data sets. Apache Spark, an in-memory data processing framework, is increasingly the solution of choice.

Spark is a framework providing parallel, distributed data processing. Spark can be deployed through Apache Hadoop via Yarn, Apache Mesos, or its own standalone cluster manager. It can serve as a foundation for other data processing frameworks, and supports programming languages including Scala, Java, and Python. Data can be accessed in HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source.

Data sets can be pinned in memory with Spark, which boosts application performance noticeably. Spark also provides speed improvements for applications running on disk and enables MapReduce to support interactive queries and stream processing far more efficiently.

And Spark eliminates the need for separate, distributed systems to process, for example, batch applications, interactive queries, iterative algorithms, and/or streaming. With Spark, all of these processing types are supported by the same engine, reducing management chores and making the processes easier to combine.

Businesses can count on Spark's benefits over the long-term. Spark, initially conceived as a project at UC Berkeley in California, moved to the Apache Software Foundation in 2013 and became a top level project in 2014. Apache top level projects, which include Hadoop, Spark, and httpd, is a designation that indicates a project has strong community backing from developers and users-and has proved its worth. More than 50 companies currently list themselves on Spark's "Powered By" page.

Putting Data-Driven Intelligence to Work
Big Data incarnates multiple processes-collection, cleansing, integration, management, governance, security, analysis, and decision-making-all of which need to be in place before a company can consider itself data-driven. Oddly, the decision-making process itself tends to get the least attention.

Gaining real ROI from a Big Data project requires more than fast tools and a solid plan to enable users to incorporate analysis-driven decision-making into their workflow. Quick discovery of exciting new insights in data has no benefit if a company doesn't have a process that enables an equally speedy and effective response to that new intelligence. When devising (or revising) your Big Data project, ensure that you build in an implementation process that enables analysis to be transformed into action.

And finally, a word of warning about real-time analysis: It's easy to lose sight of long-range goals when you're immersed in the moment. Ensure that business goals are aligned with data analysis activities, and establish KPIs to monitor the success of data-driven initiatives. Big Data should provide a company with a sustainable competitive edge.

To explore more of what Spark has to offer, jump over to Getting Started with Apache Spark: From Inception to Production, a free interactive ebook by James A. Scott.

More Stories By Jim Scott

Jim has held positions running Operations, Engineering, Architecture and QA teams in the Consumer Packaged Goods, Digital Advertising, Digital Mapping, Chemical and Pharmaceutical industries. Jim has built systems that handle more than 50 billion transactions per day and his work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts like Hadoop.

IoT & Smart Cities Stories
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments t...
If a machine can invent, does this mean the end of the patent system as we know it? The patent system, both in the US and Europe, allows companies to protect their inventions and helps foster innovation. However, Artificial Intelligence (AI) could be set to disrupt the patent system as we know it. This talk will examine how AI may change the patent landscape in the years to come. Furthermore, ways in which companies can best protect their AI related inventions will be examined from both a US and...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of San...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...