<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0" xml:base="https://opensource.com/">
  <channel>
    <title>Apache Spark</title>
        <link>https://opensource.com/tags/apache-spark</link>
        <description/>
    <language>en</language>
    
    <item>
  <title>How to process real-time data with Apache tools</title>
  <link>https://opensource.com/article/20/2/real-time-data-processing</link>
  <description>&lt;span class="field field--name-title field--type-string field--label-hidden"&gt;How to process real-time data with Apache tools&lt;/span&gt;
&lt;span class="field field--name-uid field--type-entity-reference field--label-hidden"&gt;&lt;a title="View user profile." href="https://opensource.com/users/simon-crosby" class="username"&gt;Simon Crosby&lt;/a&gt;&lt;/span&gt;
&lt;span class="field field--name-created field--type-created field--label-hidden"&gt;&lt;time datetime="2020-02-28T03:03:00-05:00" title="Friday, February 28, 2020 - 03:03" class="datetime"&gt;Fri, 02/28/2020 - 03:03&lt;/time&gt;
&lt;/span&gt;

            &lt;div class="clearfix text-formatted field field--name-field-article-subhead field--type-text-long field--label-hidden field__item"&gt;  &lt;p&gt;Open source is leading the way with a rich canvas of projects for processing real-time events.&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"&gt;  &lt;p&gt;In the "always-on" future with billions of connected devices, storing raw data for analysis later will not be an option because users want accurate responses in real time…&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="field field--name-field-lead-image field--type-entity-reference field--label-hidden field__item"&gt;       &lt;a href="https://opensource.com/article/20/2/real-time-data-processing" hreflang="und"&gt;&lt;img loading="lazy" src="https://opensource.com/sites/default/files/styles/article_teaser/public/lead-images/clocks_time.png?itok=A1s1KyEh" width="360" height="202" alt="Clocks" title="Clocks" class="image-style-article-teaser"&gt;

&lt;/a&gt;
   &lt;/div&gt;
      </description>
  <pubDate>Fri, 28 Feb 2020 08:03:00 +0000</pubDate>
    <dc:creator>Simon Crosby</dc:creator>
    <guid isPermaLink="false">61056 at https://opensource.com</guid>
    </item>
<item>
  <title>How to analyze log data with Python and Apache Spark</title>
  <link>https://opensource.com/article/19/5/visualize-log-data-apache-spark</link>
  <description>&lt;span class="field field--name-title field--type-string field--label-hidden"&gt;How to analyze log data with Python and Apache Spark&lt;/span&gt;
&lt;span class="field field--name-uid field--type-entity-reference field--label-hidden"&gt;&lt;a title="View user profile." href="https://opensource.com/users/djsarkar" class="username"&gt;djsarkar&lt;/a&gt;&lt;/span&gt;
&lt;span class="field field--name-created field--type-created field--label-hidden"&gt;&lt;time datetime="2019-05-14T03:02:00-04:00" title="Tuesday, May 14, 2019 - 03:02" class="datetime"&gt;Tue, 05/14/2019 - 03:02&lt;/time&gt;
&lt;/span&gt;

            &lt;div class="clearfix text-formatted field field--name-field-article-subhead field--type-text-long field--label-hidden field__item"&gt;  &lt;p&gt;Case study with NASA logs to show how Spark can be leveraged for analyzing data at scale.&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"&gt;  &lt;p&gt;In part one of this series, we began by using Python and Apache Spark to process and wrangle our example web logs into a format fit for analysis, a vital technique considering…&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="field field--name-field-lead-image field--type-entity-reference field--label-hidden field__item"&gt;       &lt;a href="https://opensource.com/article/19/5/visualize-log-data-apache-spark" hreflang="und"&gt;&lt;img loading="lazy" src="https://opensource.com/sites/default/files/styles/article_teaser/public/lead-images/data_metrics_analytics_desktop_laptop.png?itok=SumSqbuh" width="360" height="202" alt="Person standing in front of a giant computer screen with numbers, data" title="Person standing in front of a giant computer screen with numbers, data" class="image-style-article-teaser"&gt;

&lt;/a&gt;
   &lt;/div&gt;
      </description>
  <pubDate>Tue, 14 May 2019 07:02:00 +0000</pubDate>
    <dc:creator>djsarkar</dc:creator>
    <guid isPermaLink="false">53841 at https://opensource.com</guid>
    </item>
<item>
  <title>How to wrangle log data with Python and Apache Spark</title>
  <link>https://opensource.com/article/19/5/log-data-apache-spark</link>
  <description>&lt;span class="field field--name-title field--type-string field--label-hidden"&gt;How to wrangle log data with Python and Apache Spark&lt;/span&gt;
&lt;span class="field field--name-uid field--type-entity-reference field--label-hidden"&gt;&lt;a title="View user profile." href="https://opensource.com/users/djsarkar" class="username"&gt;djsarkar&lt;/a&gt;&lt;/span&gt;
&lt;span class="field field--name-created field--type-created field--label-hidden"&gt;&lt;time datetime="2019-05-14T03:01:00-04:00" title="Tuesday, May 14, 2019 - 03:01" class="datetime"&gt;Tue, 05/14/2019 - 03:01&lt;/time&gt;
&lt;/span&gt;

            &lt;div class="clearfix text-formatted field field--name-field-article-subhead field--type-text-long field--label-hidden field__item"&gt;  &lt;p&gt;Case study with NASA logs to show how Spark can be leveraged for analyzing data at scale.&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"&gt;  &lt;p&gt;One of the most popular and effective enterprise use-cases which leverage analytics today is log analytics. Nearly every organization today has multiple systems and…&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="field field--name-field-lead-image field--type-entity-reference field--label-hidden field__item"&gt;       &lt;a href="https://opensource.com/article/19/5/log-data-apache-spark" hreflang="und"&gt;&lt;img loading="lazy" src="https://opensource.com/sites/default/files/styles/article_teaser/public/lead-images/metrics_data_dashboard_system_computer_analytics.png?itok=E-_Aodhd" width="360" height="202" alt="metrics and data shown on a computer screen" title="metrics and data shown on a computer screen" class="image-style-article-teaser"&gt;

&lt;/a&gt;
   &lt;/div&gt;
      </description>
  <pubDate>Tue, 14 May 2019 07:01:00 +0000</pubDate>
    <dc:creator>djsarkar</dc:creator>
    <guid isPermaLink="false">53836 at https://opensource.com</guid>
    </item>
<item>
  <title>20 innovative Apache projects</title>
  <link>https://opensource.com/article/19/3/apache-projects</link>
  <description>&lt;span class="field field--name-title field--type-string field--label-hidden"&gt;20 innovative Apache projects&lt;/span&gt;
&lt;span class="field field--name-uid field--type-entity-reference field--label-hidden"&gt;&lt;a title="View user profile." href="https://opensource.com/users/sallykhudairi" class="username"&gt;sallykhudairi&lt;/a&gt;&lt;/span&gt;
&lt;span class="field field--name-created field--type-created field--label-hidden"&gt;&lt;time datetime="2019-03-26T03:01:00-04:00" title="Tuesday, March 26, 2019 - 03:01" class="datetime"&gt;Tue, 03/26/2019 - 03:01&lt;/time&gt;
&lt;/span&gt;

            &lt;div class="clearfix text-formatted field field--name-field-article-subhead field--type-text-long field--label-hidden field__item"&gt;  &lt;p&gt;As the Apache Software Foundation turns 20, let's celebrate by recognizing 20 influential and up-and-coming Apache projects.&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"&gt;  &lt;p&gt;As the world's largest and one of the most influential open source foundations, the Apache Software Foundation (ASF) is home to more than 350 community-led projects and…&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="field field--name-field-lead-image field--type-entity-reference field--label-hidden field__item"&gt;       &lt;a href="https://opensource.com/article/19/3/apache-projects" hreflang="und"&gt;&lt;img loading="lazy" src="https://opensource.com/sites/default/files/styles/article_teaser/public/lead-images/Collaboration%20for%20health%20innovation.png?itok=crcO-Ilx" width="360" height="202" alt="lightbulb drawing outline" title="lightbulb drawing outline" class="image-style-article-teaser"&gt;

&lt;/a&gt;
   &lt;/div&gt;
      
          &lt;a title="View user profile." href="https://opensource.com/users/jimjag" class="username"&gt;jimjag&lt;/a&gt;
    </description>
  <pubDate>Tue, 26 Mar 2019 07:01:00 +0000</pubDate>
    <dc:creator>sallykhudairi</dc:creator>
    <guid isPermaLink="false">53111 at https://opensource.com</guid>
    </item>
<item>
  <title>An introduction to data processing with Cassandra and Spark</title>
  <link>https://opensource.com/life/16/5/basics-cassandra-and-spark-data-processing</link>
  <description>&lt;span class="field field--name-title field--type-string field--label-hidden"&gt;An introduction to data processing with Cassandra and Spark&lt;/span&gt;
&lt;span class="field field--name-uid field--type-entity-reference field--label-hidden"&gt;&lt;a title="View user profile." href="https://opensource.com/users/dtrapezoid" class="username"&gt;dtrapezoid&lt;/a&gt;&lt;/span&gt;
&lt;span class="field field--name-created field--type-created field--label-hidden"&gt;&lt;time datetime="2016-05-11T03:01:00-04:00" title="Wednesday, May 11, 2016 - 03:01" class="datetime"&gt;Wed, 05/11/2016 - 03:01&lt;/time&gt;
&lt;/span&gt;

            &lt;div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"&gt;  &lt;p&gt;This article is co-authored by Jon Haddad. There's been a huge surge of interest around the Apache Cassandra database due to the increasing uptime and performance demands of…&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="field field--name-field-lead-image field--type-entity-reference field--label-hidden field__item"&gt;       &lt;a href="https://opensource.com/life/16/5/basics-cassandra-and-spark-data-processing" hreflang="und"&gt;&lt;img loading="lazy" src="https://opensource.com/sites/default/files/styles/article_teaser/public/lead-images/osdc_520x292_opendata_0613mm.png?itok=8ZjQN7ZD" width="360" height="202" alt="Open data brain" title="Open data brain" class="image-style-article-teaser"&gt;

&lt;/a&gt;
   &lt;/div&gt;
      </description>
  <pubDate>Wed, 11 May 2016 07:01:00 +0000</pubDate>
    <dc:creator>dtrapezoid</dc:creator>
    <guid isPermaLink="false">28436 at https://opensource.com</guid>
    </item>
<item>
  <title>A guide to Apache's Spark Streaming</title>
  <link>https://opensource.com/business/15/4/guide-to-apache-spark-streaming</link>
  <description>&lt;span class="field field--name-title field--type-string field--label-hidden"&gt;A guide to Apache's Spark Streaming&lt;/span&gt;
&lt;span class="field field--name-uid field--type-entity-reference field--label-hidden"&gt;&lt;a title="View user profile." href="https://opensource.com/users/arush" class="username"&gt;arush&lt;/a&gt;&lt;/span&gt;
&lt;span class="field field--name-created field--type-created field--label-hidden"&gt;&lt;time datetime="2015-04-23T03:00:00-04:00" title="Thursday, April 23, 2015 - 03:00" class="datetime"&gt;Thu, 04/23/2015 - 03:00&lt;/time&gt;
&lt;/span&gt;

            &lt;div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"&gt;  &lt;p&gt;Apache Spark is an open source cluster computing framework. In contrast to Hadoop’s two-stage disk-based MapReduce paradigm, Spark’s in-memory primitives provide performance…&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="field field--name-field-lead-image field--type-entity-reference field--label-hidden field__item"&gt;       &lt;a href="https://opensource.com/business/15/4/guide-to-apache-spark-streaming" hreflang="und"&gt;&lt;img loading="lazy" src="https://opensource.com/sites/default/files/styles/article_teaser/public/lead-images/BUSINESS_lightbulbs.png?itok=iCBWi4_d" width="360" height="202" alt="One lightbulb lit out of several" title="One lightbulb lit out of several" class="image-style-article-teaser"&gt;

&lt;/a&gt;
   &lt;/div&gt;
      </description>
  <pubDate>Thu, 23 Apr 2015 07:00:00 +0000</pubDate>
    <dc:creator>arush</dc:creator>
    <guid isPermaLink="false">19596 at https://opensource.com</guid>
    </item>
<item>
  <title>Spark ignites at ApacheCon</title>
  <link>https://opensource.com/business/15/4/interview-reynold-xin-apache</link>
  <description>&lt;span class="field field--name-title field--type-string field--label-hidden"&gt;Spark ignites at ApacheCon&lt;/span&gt;
&lt;span class="field field--name-uid field--type-entity-reference field--label-hidden"&gt;&lt;a title="View user profile." href="https://opensource.com/users/jen-wike" class="username"&gt;Jen Wike Huger&lt;/a&gt;&lt;/span&gt;
&lt;span class="field field--name-created field--type-created field--label-hidden"&gt;&lt;time datetime="2015-04-06T05:00:00-04:00" title="Monday, April 6, 2015 - 05:00" class="datetime"&gt;Mon, 04/06/2015 - 05:00&lt;/time&gt;
&lt;/span&gt;

            &lt;div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"&gt;  &lt;p&gt;ApacheCon is coming up, and within that massive conference there will be a glimmering gem: a forum dedicated to Spark. Reynold Xin is organizing it, and he shared some…&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="field field--name-field-lead-image field--type-entity-reference field--label-hidden field__item"&gt;       &lt;a href="https://opensource.com/business/15/4/interview-reynold-xin-apache" hreflang="und"&gt;&lt;img loading="lazy" src="https://opensource.com/sites/default/files/styles/article_teaser/public/lead-images/osdc_eventscene_life.png?itok=r4m4MoAs" width="360" height="202" alt="On the scene" title="On the scene" class="image-style-article-teaser"&gt;

&lt;/a&gt;
   &lt;/div&gt;
      </description>
  <pubDate>Mon, 06 Apr 2015 09:00:00 +0000</pubDate>
    <dc:creator>Jen Wike Huger</dc:creator>
    <guid isPermaLink="false">19273 at https://opensource.com</guid>
    </item>
<item>
  <title>Using Spark DataFrames for large scale data science</title>
  <link>https://opensource.com/business/15/3/using-spark-dataframes-large-scale-data-science</link>
  <description>&lt;span class="field field--name-title field--type-string field--label-hidden"&gt;Using Spark DataFrames for large scale data science&lt;/span&gt;
&lt;span class="field field--name-uid field--type-entity-reference field--label-hidden"&gt;&lt;a title="View user profile." href="https://opensource.com/users/rxin" class="username"&gt;rxin&lt;/a&gt;&lt;/span&gt;
&lt;span class="field field--name-created field--type-created field--label-hidden"&gt;&lt;time datetime="2015-03-26T07:00:00-04:00" title="Thursday, March 26, 2015 - 07:00" class="datetime"&gt;Thu, 03/26/2015 - 07:00&lt;/time&gt;
&lt;/span&gt;

            &lt;div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"&gt;  &lt;p&gt;When we first open sourced Spark, we aimed to provide a simple API for distributed data processing in general-purpose programming languages (Java, Python, Scala). Spark…&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="field field--name-field-lead-image field--type-entity-reference field--label-hidden field__item"&gt;       &lt;a href="https://opensource.com/business/15/3/using-spark-dataframes-large-scale-data-science" hreflang="und"&gt;&lt;img loading="lazy" src="https://opensource.com/sites/default/files/styles/article_teaser/public/lead-images/BUSINESS_community_1.png?itok=R9LrfdMd" width="360" height="202" alt="Lots of people in a crowd." title="Lots of people in a crowd." class="image-style-article-teaser"&gt;

&lt;/a&gt;
   &lt;/div&gt;
      </description>
  <pubDate>Thu, 26 Mar 2015 11:00:00 +0000</pubDate>
    <dc:creator>rxin</dc:creator>
    <guid isPermaLink="false">19291 at https://opensource.com</guid>
    </item>
<item>
  <title>World record set for 100 TB sort by open source and public cloud team</title>
  <link>https://opensource.com/business/15/1/apache-spark-new-world-record</link>
  <description>&lt;span class="field field--name-title field--type-string field--label-hidden"&gt;World record set for 100 TB sort by open source and public cloud team&lt;/span&gt;
&lt;span class="field field--name-uid field--type-entity-reference field--label-hidden"&gt;&lt;a title="View user profile." href="https://opensource.com/users/rxin" class="username"&gt;rxin&lt;/a&gt;&lt;/span&gt;
&lt;span class="field field--name-created field--type-created field--label-hidden"&gt;&lt;time datetime="2015-01-15T05:00:00-05:00" title="Thursday, January 15, 2015 - 05:00" class="datetime"&gt;Thu, 01/15/2015 - 05:00&lt;/time&gt;
&lt;/span&gt;

            &lt;div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"&gt;  &lt;p&gt;In October 2014, Databricks participated in the Sort Benchmark and set a new world record for sorting 100 terabytes (TB) of data, or 1 trillion 100-byte records. The team used…&lt;/p&gt;


&lt;/div&gt;
      
            &lt;div class="field field--name-field-lead-image field--type-entity-reference field--label-hidden field__item"&gt;       &lt;a href="https://opensource.com/business/15/1/apache-spark-new-world-record" hreflang="und"&gt;&lt;img loading="lazy" src="https://opensource.com/sites/default/files/styles/article_teaser/public/lead-images/BUSINESS_opennature_3.png?itok=U08Ipz3f" width="360" height="202" alt="Two different paths to different outcomes" title="Two different paths to different outcomes" class="image-style-article-teaser"&gt;

&lt;/a&gt;
   &lt;/div&gt;
      </description>
  <pubDate>Thu, 15 Jan 2015 10:00:00 +0000</pubDate>
    <dc:creator>rxin</dc:creator>
    <guid isPermaLink="false">19019 at https://opensource.com</guid>
    </item>

  </channel>
</rss>
