tag:blogger.com,1999:blog-46542829470031452602024-03-17T08:33:39.455+01:002pk03 over AI, ML, BigData and data processingHere I blog about my tech stuff, what I do and what my thoughts are. This blog is my private one, not related to my ventures. Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.comBlogger90125tag:blogger.com,1999:blog-4654282947003145260.post-39367244047260191422023-07-15T00:00:00.018+02:002023-07-15T19:33:24.946+02:00Life hacks for your startup with OpenAI and Bard prompts<div style="text-align: left;"><span style="font-weight: normal;"><a href="https://chat.openai.com/" target="_blank">OpenAI</a> and <a href="https://bard.google.com/" target="_blank">Bard</a> </span>are the most used GenAI tools today; the first one has a massive Microsoft investment, and the other one is an experiment from Google. But did you know that you can also use them to optimize and hack your startup? Even creating pitch scripts, sales emails, and elevator pitches with one (or both) of them helps you not only save time but also validate your marketing and wording. Curios? Here a few prompt hacks for startups to create / improve / validate buyer personas, your startups mission / vision statements, and USP definitions.</div><div style="text-align: left;"><br /></div><h2 style="text-align: left;"><span style="font-weight: normal;">Introduce yourself and your startup</span></h2><div><div>Introduce yourself, your startup, your website, your idea, your position, and in a few words what you are doing to the chatbot:<br /><br /></div><div><b>Prompt</b>: <i>I'm NAME and our startup NAME, with website URL, is doing WHATEVER. With PRODUCT NAME, we aim to change or disrupt INDUSTRY.</i><br /><br />Bard is able to pull information from your website. I'm not sure if ChatGPT can do that, though. But nevertheless, now you have laid a great foundation for the model to understand you and what you are doing.</div><div><br /><h2 style="text-align: left;"><span style="font-weight: normal;">Mission statement</span></h2></div><div><b>Prompt</b>: <i>Our current mission/vision is: YOUR VISION / MISSION. Help me enhance it by making it clearer, and more inspiring.</i><br /><br /></div><div>If you don't have a mission yet, use this <b>prompt</b>: </div><div><i>I think about a compelling mission and vision statement, help me draft some.</i></div><div><i><br /></i><h2 style="text-align: left;"><span style="font-weight: normal;">Buyers persona</span></h2></div><div><b>Prompt</b>: <i>Help me create a buyer persona for my product PRODUCT NAME that we should approach; this persona needs to have the power to buy products for $500k with one check.</i><br /><br /><b>Prompt:</b> <i>Help me create a buyer-supporting persona for PRODUCT NAME we should approach. Those who are primarily INDUSTRY AREA and are familiar with WHAT YOU WANT TO DISRUPT and related activities.</i><br /><br /><h2 style="text-align: left;"><span style="font-weight: normal;">USP</span></h2></div><div><b>Prompt</b>: <i>We are developing a product called PRODUCT NAME. It is a PRODUCT IDEA that offers WHATEVER YOU PROVIDE. I need you to assist me in crafting a compelling and concise description that highlights its unique selling proposition.</i><br /></div><div><i><br /></i></div><h2 style="text-align: left;"><span style="font-weight: normal;">Elevator Pitch</span></h2><div><b>Prompt: </b><i>Help me draft a convincing elevator pitch for PRODUCT NAME with a professional (or convincing) tone.</i><br /><br /><h2 style="text-align: left;"><span style="font-weight: normal;">Market size, SOM and SAM</span></h2><div><b>Prompt</b>: <i>How big is the market size for PRODUCT DESCRIPTION / PRODUCT NAME, and what could be our serviceable and obtainable market size for the COUNTRY or region?</i></div><div><br /></div><div>This is quite an interesting prompt; the first answer will never match; you have to improve the answer by defining more parameters. As an example: <i>I wanted to know how big the market size for our product is; we target the US Educational market</i>.</div><div><br /></div><h2 style="text-align: left;"><span style="font-weight: normal;">To wrap it up</span></h2><div>Now, I think you've got it. Also note that when you start to chat with one of the tools, they mostly know nothing about you, the company, or anything else. That means introducing, explaining, and improving the answers, like:</div></div><div><i><br /></i></div><div><i>That was not what I wanted to know, let me rephrase. </i></div><div><i><br /></i></div><div>Now rephrase your question: that triggers some kind of reinforcement learning, and the AI might be able to pull more and better information. Every rephrase and piece of information helps nail down the best response for you. I hope that helps a bit in improving your sales and marketing efforts for your startup.</div></div>Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com0Miami, FL, USA25.7616798 -80.1917902-2.5485540361788459 -115.3480402 54.071913636178849 -45.0355402tag:blogger.com,1999:blog-4654282947003145260.post-34417068038264024482023-06-14T15:47:00.008+02:002023-06-14T15:49:52.825+02:00Can Generative AI profit from using Federated Data Processing?<br />OpenAI and Bard are all over the place, and I asked Bard: <div><br /></div><div><blockquote>Can Generative AI profit from using Federated Data Processing?</blockquote></div><div><br /> Generative AI can profit from federated data processing by accessing and processing data without having to transfer it to a centralized location. This can be beneficial for a number of reasons, including: </div><div><ul style="text-align: left;"><li>Privacy: Federated data processing can help to protect user privacy by keeping data on the devices where it was generated. This can be important for users who are concerned about their privacy, or for applications that require sensitive data, such as medical records.</li><li>Security: Federated data processing can help to improve security by making it more difficult for attackers to access data. This is because the data is never stored in a centralized location, where it could be more vulnerable to attack.</li><li>Scalability: Federated data processing can be scaled more easily than traditional centralized data processing. This is because the data does not have to be transferred to a central server, which can be a bottleneck in traditional data processing systems.</li></ul></div><div><br />As a result of these benefits, federated data processing is becoming increasingly popular for a variety of applications, including:</div><div><ul style="text-align: left;"><li>Fraud detection: Federated data processing can be used to detect fraud by analyzing data from multiple sources without having to transfer the data to a centralized location. This can help to improve fraud detection accuracy and reduce the risk of false positives.</li><li>Medical research: Federated data processing can be used to conduct medical research by analyzing data from multiple sources, such as electronic health records and clinical trials. This can help to improve the speed and accuracy of medical research and make it easier to identify new treatments and cures for diseases.</li><li>Personalized recommendations: Federated data processing can be used to generate personalized recommendations for products, services, and content by analyzing data from multiple sources, such as purchase history, browsing behavior, and social media activity. This can help businesses to improve customer satisfaction and increase sales.</li></ul><br />Overall, federated <a href="https://www.databloom.ai/" target="_blank">data processing</a> can offer a number of benefits for generative AI, including privacy, security, scalability, and improved accuracy. As a result, federated data processing is becoming increasingly popular for a variety of applications, including fraud detection, medical research, and personalized recommendations.</div><div><br /></div><a href="https://www.databloom.ai/blossom-sky" target="_blank">Blossom Sky</a> can help generative AI profit by providing a secure, scalable, and efficient way to process data. This can help to improve the accuracy, cost-effectiveness, and privacy of generative AI models.<div><br /></div>Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com0tag:blogger.com,1999:blog-4654282947003145260.post-19124629646355721462023-03-12T10:40:00.002+01:002023-03-12T10:40:56.172+01:00How recasta.io is Disrupting the IoT Industry with its Cloud Native Platform recasta.cloud<p>The Internet of Things (IoT) is a term that refers to the network of physical devices, sensors, and machines that can communicate and exchange data over the internet. IoT has many applications in various domains such as smart homes, smart cities, healthcare, agriculture, manufacturing, and more.</p><p>However, building and managing IoT solutions is not an easy task. It requires connecting different types of hardware and software components, ensuring security and privacy of data and devices, collecting and analyzing large volumes of data, and integrating with existing business systems and web services.</p><p>This is where IoT platforms come in handy. An IoT platform is a software solution that helps developers and businesses to create, deploy, and manage IoT applications. An IoT platform typically provides features such as:</p><p></p><ul style="text-align: left;"><li>Device management: To register, monitor, control, update, and troubleshoot devices remotely</li><li>Connectivity: To support different communication protocols such as MQTT, CoAP, HTTP etc.</li><li>Security: To authenticate devices and users and encrypt data transmissions</li><li>Data management: To store, process, visualize, and analyze data from devices</li><li>Integration: To connect with other web services or business systems via APIs or SDKs</li></ul><p></p><p>There are many IoT platforms available in the market today. Some of them are offered by tech giants such as IBM Watson IoT, Microsoft Azure IoT, Google Cloud IoT, Amazon Web Services IoT etc. Others are open-source solutions such as ThingsBoard, Eclipse Kapua, Kaa etc. And there is one platform that stands out from the rest with its unique features and capabilities. That platform is <a href="https://concole.recasta.cloud">recasta.cloud</a>, built and offered by the Maltese startup <a href="https://recasta.io" target="_blank">recasta</a>.</p><p>Recasta is a cloud native IoT platform that aims to disrupt the IoT industry with its innovative approach. Recasta.io claims to be the first full cloud native stack for IoT applications. Here's a screenshot from my namespace I used to test the platform. All in all it took me 5 minutes to </p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhR59nCdH0hEebI2aAb1F91fnZUvA-QLPgIWqt371TIWs1Qse4v0YWspsCPq35NPfVRYsHeA6_Ppx8QtZ80oEZ0FavzXLPJDAew5726woYqpK4msdagosozBubBLUWUaaVy2zsMjdbBegw3LWQ1YdEAlGVxknfVnxJmvXfrtnG9WKPShKHPlu43Lvn22w" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="821" data-original-width="1689" height="312" src="https://blogger.googleusercontent.com/img/a/AVvXsEhR59nCdH0hEebI2aAb1F91fnZUvA-QLPgIWqt371TIWs1Qse4v0YWspsCPq35NPfVRYsHeA6_Ppx8QtZ80oEZ0FavzXLPJDAew5726woYqpK4msdagosozBubBLUWUaaVy2zsMjdbBegw3LWQ1YdEAlGVxknfVnxJmvXfrtnG9WKPShKHPlu43Lvn22w=w640-h312" width="640" /></a></div><br /><br /><p></p><h2 style="text-align: left;"><b>What does cloud native mean?</b> </h2><p>Let's have a short dive into the term Cloud Native: <br />According to the Cloud Native Computing Foundation (<a href="https://www.cncf.io" target="_blank">CNCF</a>), cloud native technologies are those that empower organizations to build and run scalable applications in modern dynamic environments such as public cloud or private cloud.</p><p>Cloud native technologies typically include:</p><p></p><ul style="text-align: left;"><li>Containers: Lightweight packages of software that can run isolated on any operating system</li><li>Microservices: Small independent services that communicate via APIs</li><li>Orchestration: Tools that automate the deployment, scaling, and management of containers</li><li>Serverless: A model where code execution is triggered by events rather than servers</li></ul><p></p><p>And here comes again recasta.io, the cloud they offer leverages these technologies and provides a fast, flexible and scalable platform for building IoT applications. This comes with a lot of benefits for IoT related projects:</p><p></p><ul style="text-align: left;"><li>No vendor lock-in: Recasta.io runs on any public cloud provider such as AWS, Azure, Google Cloud etc.</li><li>No infrastructure management: Recasta.io handles all the aspects of provisioning, monitoring, updating, and scaling resources for your application</li><li>No coding required: recasta provides a graphical user interface (GUI) where you can drag-and-drop components to create your application logic without writing any code</li><li>No limits on scalability: The recasta stack can handle millions of devices and billions of messages per day without compromising performance or reliability</li></ul><p></p><p>Recasta.io also integrates with Dark Sky, a popular weather service powered by Apple Weather service, to provide real-time weather data for your application. You can use this data to trigger actions based on temperature, humidity, wind speed etc. As example, you can create an application that automatically adjusts your smart thermostat based on weather conditions using recasta.cloud as a backend. </p><p>To get started with recasta and build our own disruptive IoT application, you can sign up for a free account at <a href="https://www.recasta.io" target="_blank">https://www.recasta.io/ </a>. You can also check out their documentation at their GitHub via <a href="https://github.com/recasta/cloud-core/wiki">recasta/cloud-core Wiki</a>.</p><h3 style="text-align: left;"><b>Summary</b></h3>Recasta offers an innovative and exciting new platform that aims to transform the way we build and manage IoT applications in a stagnant market. Whether you need a simple or a complex solution for your next IoT project, recasta.io has you covered!Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com0tag:blogger.com,1999:blog-4654282947003145260.post-37472146563463405062023-01-18T14:57:00.002+01:002023-01-18T14:59:12.674+01:00Some fun with Apache Wayang and Spark / TensorflowApache Wayang is an open-source Federated Learning (FL) framework developed by the Apache Software Foundation. It provides a platform for distributed machine learning, with a focus on ease of use and flexibility. It supports multiple FL scenarios and provides a variety of tools and components for building FL systems. It also includes support for various communication protocols and data formats, as well as integration with other Apache projects such as Apache Kafka and Apache Pulsar for data streaming. The project aims to make it easier to develop and deploy machine learning models in decentralized environments.<p style="text-align: left;">It's important to note that this are just examples and they may not be the way for your project to interact with Apache Wayang, you may need to check the documentation of the Apache Wayang project (<a href="https://wayang.apache.org" target="_blank">https://wayang.apache.org</a>) to see how to interact with it. I just point out how easy it is to use different languages to interact between Wayang and Spark.<br /><br />Also, you need to make sure that you have the correct permissions and credentials to interact with the Wayang API and make changes to the Spark cluster.</p><h3 style="text-align: left;">Wayang - Scala - Spark:</h3><div><div><span style="font-family: courier;">import org.apache.wayang.{Wayang, WayangClient}</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;">class SparkScaler(wayangUrl: String) {</span></div><div><span style="font-family: courier;"> val wayang = new WayangClient(wayangUrl)</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;"> def scaleUp(numWorkers: Int): Unit = {</span></div><div><span style="font-family: courier;"> wayang.addWorkers(numWorkers)</span></div><div><span style="font-family: courier;"> }</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;"> def scaleDown(numWorkers: Int): Unit = {</span></div><div><span style="font-family: courier;"> wayang.removeWorkers(numWorkers)</span></div><div><span style="font-family: courier;"> }</span></div><div><span style="font-family: courier;">}</span></div><div><br /></div>The <span style="font-family: courier;">SparkScaler</span> class takes a single parameter, the URL of the Wayang API endpoint, when it is initialized. The <span style="font-family: courier;">scaleUp()</span> method can be called to add a specified number of workers to the Spark cluster, and the <span style="font-family: courier;">scaleDown()</span> method can be called to remove a specified number of workers.<br /></div><h3 style="text-align: left;">Wayang - Python - Spark</h3><div><div><span style="font-family: courier;">from apache_wayang import Wayang</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;">class SparkScaler:</span></div><div><span style="font-family: courier;"> def __init__(self, wayang_url):</span></div><div><span style="font-family: courier;"> self.wayang = Wayang(wayang_url)</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;"> def scale_up(self, num_workers):</span></div><div><span style="font-family: courier;"> self.wayang.add_workers(num_workers)</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;"> def scale_down(self, num_workers):</span></div><div><span style="font-family: courier;"> self.wayang.remove_workers(num_workers)</span></div><div><br /></div><br />The <span style="font-family: courier;">SparkScaler</span> class takes a single parameter, the URL of the Wayang API endpoint, when it is initialized. The <span style="font-family: courier;">scale_up()</span> method can be called to add a specified number of workers to the Spark cluster, and the <span style="font-family: courier;">scale_down()</span> method can be called to remove a specified number of workers.<br /><h3 style="text-align: left;"><span style="font-family: inherit;">Wayang - Java Streams - Spark</span></h3></div><div><div><span style="font-family: courier;">import org.apache.wayang.WayangClient;</span></div><div><span style="font-family: courier;">import java.util.stream.IntStream;</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;">public class SparkScaler {</span></div><div><span style="font-family: courier;"> private WayangClient wayang;</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;"> public SparkScaler(String wayangUrl) {</span></div><div><span style="font-family: courier;"> wayang = new WayangClient(wayangUrl);</span></div><div><span style="font-family: courier;"> }</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;"> public void scaleUp(int numWorkers) {</span></div><div><span style="font-family: courier;"> IntStream.range(0, numWorkers).forEach(i -> wayang.addWorker());</span></div><div><span style="font-family: courier;"> }</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;"> public void scaleDown(int numWorkers) {</span></div><div><span style="font-family: courier;"> IntStream.range(0, numWorkers).forEach(i -> wayang.removeWorker());</span></div><div><span style="font-family: courier;"> }</span></div><div><span style="font-family: courier;">}</span></div></div><div><br /></div>The <span style="font-family: courier;">SparkScaler</span> class takes a single parameter, the URL of the Wayang API endpoint, when it is initialized. The <span style="font-family: courier;">scaleUp() </span>method can be called to add a specified number of workers to the Spark cluster, and the <span style="font-family: courier;">scaleDown()</span> method can be called to remove a specified number of workers.<h3 style="text-align: left;">Iterate the K-Means clustering algorithm from Apache Wayang to TensorFlow</h3><div><span style="font-family: courier;">import org.apache.wayang.WayangClient;</span></div><div><div><span style="font-family: courier;">import org.tensorflow.Graph;</span></div><div><span style="font-family: courier;">import org.tensorflow.Session;</span></div><div><span style="font-family: courier;">import org.tensorflow.Tensor;</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;">public class KMeansIteration {</span></div><div><span style="font-family: courier;"> private WayangClient wayang;</span></div><div><span style="font-family: courier;"> private Graph graph;</span></div><div><span style="font-family: courier;"> private Session session;</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;"> public KMeansIteration(String wayangUrl, String modelPath) {</span></div><div><span style="font-family: courier;"> wayang = new WayangClient(wayangUrl);</span></div><div><span style="font-family: courier;"> graph = new Graph();</span></div><div><span style="font-family: courier;"> graph.importGraphDef(modelPath);</span></div><div><span style="font-family: courier;"> session = new Session(graph);</span></div><div><span style="font-family: courier;"> }</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;"> public void iterate(Tensor input) {</span></div><div><span style="font-family: courier;"> Tensor wayangOutput = wayang.runKMeans(input);</span></div><div><span style="font-family: courier;"> Tensor tfOutput = session.runner().feed("input", wayangOutput).fetch("output").run().get(0);</span></div><div><span style="font-family: courier;"> // Perform further processing on tfOutput</span></div><div><span style="font-family: courier;"> }</span></div><div><span style="font-family: courier;">}</span></div></div><div><br /></div>The <span style="font-family: courier;">KMeansIteration </span>class takes two parameters, the URL of the Wayang API endpoint and the path of the TensorFlow model, when it is initialized. The <span style="font-family: courier;">iterate() </span>method can be called with an input Tensor, it will pass it to the Wayang's K-Means clustering algorithm, it will receive the output, and then will pass it to the TensorFlow's model as an input.<div><br /></div><div>That's are only examples to show how easy it can be to get started with FL and also get involved into Wayang as a developer. Also consider to contribute to the project, check the project under <a href="http://wayang.apache.org">wayang.apache.org</a> </div>Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com1tag:blogger.com,1999:blog-4654282947003145260.post-27393325122591780672022-12-14T21:07:00.004+01:002022-12-14T21:12:55.657+01:00The A3 Home Decor<p>One of my new angel investments is an e-commerce retailer, so nothing new. But the founder has some real drive; she wants to deliver sustainable and reusable tableware, or more table couture. Yes, there are a few out there, but the typical ones have mostly plastic, paper, and other non-sustainable decor or tableware.</p><p>The A3 - <a href="https://the-a3.com" target="_blank"><b>https://the-a3.com</b></a> - has a better idea, as Alexandra, founder and CEO explains: </p><p></p><blockquote>A3 is an eco lifestyle home decor and tablecloth brand where design and quality are always top of mind. With A3, you make your home truly a home. </blockquote><p></p><p>Okay - that was the pitch. So why now and how to get customers? How to build a chain of trust in the crowded home interior and decor market? Here some brain food:</p><p>In 2021, the size of the world market for home decor was estimated to be USD 665 billion. By 2030, it is anticipated to reach USD 940 billion, expanding at a CAGR of 3.9%. (2022-2030). Various environments, including spas, offices, clean rooms, restaurants, camping, bedrooms, the outdoors, the library, and stores, have home décor products like furniture and textiles installed. Products for home decor range from furniture to floor coverings and domestic linens. Furniture designs can be altered using both machine-based and handcrafted techniques, depending on the intended application. Currently, the worldwide home decor market is expanding due to the expanding real estate sector. </p><p>Additionally, the increase in compact dwellings has promoted the use of products with extra storage features and improved the visual appeal of residences. A crucial factor in the expansion of the global market is the growing customer preference for eco-friendly home design items, which is a result of growing environmental concerns.</p><p>You see, we talk about a market of almost <b>ONE TRILLION</b> dollars in 2030. This market moves toward more sustainable, eco-friendly products like hemp napkins or placemats made from vegan leather (not PVC or silicate-infused hemp) or felt. Tablerunner, tableware, candle holders, carafes, glassware—you name it. And all of those products are small buys; there is no high UPC attached to them, which means they get bought pretty easily. Now, according to market research and customer journeys, there is no big retailer who could deliver the dream of every coolly designed home. The big ones have long supplier relations; adding new products at scale to a supplier market that can't deliver does exactly this. And here's the niche: to grow with the audience and market, deliver sustainable bowls for vegan cooking, crafted dinner table settings, and ethically produced eco-friendly home decor. Paring this with excellent blogging, tips, and guides, I see a strong demand in the next few years, a demand that the A3 can uphold, master, and even define.</p><p></p><blockquote>Our manufacturing partners hold some of the highest standards in social and environmental responsibility with the certifications to back them up. We regularly review our partners and shipping warehouses to make sure that we uphold our values.</blockquote><p></p><blockquote>We present to you carefully selected products from across the world. We chose an exquisite range of handcrafted and unique products to honor the craft and the craftspeople behind it! We are passionate about our responsibility towards our planet and are committed to do our part as best as we can. </blockquote><p></p><blockquote> Alexandra Alten, Founder and CEO of The A3</blockquote><p></p><p><br /></p><p><br /></p>Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com0tag:blogger.com,1999:blog-4654282947003145260.post-79707069260111128452022-09-22T14:48:00.006+02:002022-09-22T14:54:29.000+02:00Get Apache Wayang ready to test within 5 minutes <p>Hey followers, I often get ask how to get Apache Wayang (<a href="https://wayang.apache.org" target="_blank"><b>https://wayang.apache.org</b></a>) up and running without having a full big data processing system behind. We heard you, we built a full fledged docker container, called BDE (Blossom Development Environment), which is basically Wayang. Here's the repo: <a href="https://github.com/databloom-ai/BDE">https://github.com/databloom-ai/BDE</a></p><p>I made a short screencast how to get it running with Docker on OSX, and we also have made two hands-on videos to explain the first steps. Let's start with the basics - Docker. Get the whole platform with:</p><span style="font-family: courier;">docker pull ghcr.io/databloom-ai/bde:main</span><div><span style="font-family: courier;"><br /></span></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZ5OXvkELwmVRqzz8-pxGNuEetApT22ZWcmEg48a8IYEv9fPZ9vU-B2077wyKAdx-hpknE_XyNk2C67wLS8rJGxYZ8SGQdYwccayEKIwMEDvMvHthJEYIAYtdZMFHrWN4l4SshbEf0FrPii1xrMFcdjnbvAT2j0iZudQu3AKfs7dm98xt9JjwyaGcQrA/s1197/bde.gif" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="804" data-original-width="1197" height="444" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZ5OXvkELwmVRqzz8-pxGNuEetApT22ZWcmEg48a8IYEv9fPZ9vU-B2077wyKAdx-hpknE_XyNk2C67wLS8rJGxYZ8SGQdYwccayEKIwMEDvMvHthJEYIAYtdZMFHrWN4l4SshbEf0FrPii1xrMFcdjnbvAT2j0iZudQu3AKfs7dm98xt9JjwyaGcQrA/w660-h444/bde.gif" width="660" /></a></div><br />At the end the Jupyter notebook address is shown, control-click on it (OS X); the browser should open and login you automatically:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjODDYn9NmoVnJAwaP9uWFRgEqR6AxTjGfI8RW1iAHzWcoTwVEJ_owxy8WNWsNXwvvnCDJuPUYe80TgEHk5xBlWadetu5hRCig_u2wE7BGH7QZsEgfuMc0xgvLlmdxsUxQHXEXg0Gx7i4827Ta58yDGH430-NTdrEbL6yhiEBNOInhO6pkEqH9EaM7e9A/s1882/wayang-jupyter.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="710" data-original-width="1882" height="242" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjODDYn9NmoVnJAwaP9uWFRgEqR6AxTjGfI8RW1iAHzWcoTwVEJ_owxy8WNWsNXwvvnCDJuPUYe80TgEHk5xBlWadetu5hRCig_u2wE7BGH7QZsEgfuMc0xgvLlmdxsUxQHXEXg0Gx7i4827Ta58yDGH430-NTdrEbL6yhiEBNOInhO6pkEqH9EaM7e9A/w640-h242/wayang-jupyter.png" width="640" /></a></div><br /><div><br />Voila - done. You have now a full working Wayang environment, we prepared three notebooks to make it more easy to dive into. Watch our development tutorial video (part 1) to get a better understanding what Wayang can do, and what not. Click the video below: </div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='653' height='459' src='https://www.blogger.com/video.g?token=AD6v5dx6dqx5ncW3refR1XaPieiI9VaKfWGx87zwLfA6uL84_hGDyv0A9CEIX8Kw1Ja4tWTJGfGQlDysKeKgA0jXlA' class='b-hbp-video b-uploaded' frameborder='0'></iframe></div><br /><div><br /><br /><br /><br /><br /></div>Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com0Miami, FL, USA25.7616798 -80.1917902-2.5485540361788459 -115.3480402 54.071913636178849 -45.0355402tag:blogger.com,1999:blog-4654282947003145260.post-5432351482049732582022-07-27T17:40:00.001+02:002022-09-22T17:09:25.654+02:00Combined Federated Data Services with Blossom and Flower<p>When it comes to Federated Learning frameworks we typically find two leading open source projects - Apache Wayang [2] (maintained by <a href="http://databloom.ai" target="_blank">databloom</a>) and Flower [3] (maintained by <a href="https://adap.com/en" target="_blank">Adap</a>). And at the first view both frameworks seem to do the same. But, as usual, the 2nd view tells another story.</p><b>How does Flower differ from Wayang?<br /></b><br />Flower is a federated learning system, written in Python and supports a large number of training and AI frameworks. The beauty of Flower is the strategy concept [4]; the data scientist can define which and how a dedicated framework is used. Flower delivers the model to the desired framework and watches the execution, gets the calculations back and starts the next cycle. That makes Federated Learning in Python easy, but also limits the use at the same time to platforms supported by Python. <div>Flower has, as far as I could see, no data query optimizer; an optimizer understands the code and splits the model into smaller pieces to use multiple frameworks at the same time (<i>model parallelism</i>). </div><div><br /></div><div>And here we have the ideal touchpoint between Blossom and Flower.<br /><br /><b>Combine Blossom and Flower and build a Federated Data Science NLP Stack<br /></b><br />How to build a chatbot system, which serves multiple functions and customers across the world, like in a bank? A chatbot stack typically uses NLP combined with multiple data source to provide a natural communication between humans and machines. The demand of Machine-Human interaction and human based communication has considerably increased and the forecasts of Gartner are a testament to it.</div><div><br /></div><div>"Natural language processing is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data" (<a href="https://en.wikipedia.org/wiki/Natural_language_processing" target="_blank">Wikipedia</a>).</div><div><br /></div><div>The typical infrastructure we have to take into account is like a hyper grown forest: We have multiple data sources, typically reaching from data warehouses over RDBMS systems, pretty closed data sources like financial transaction stores, customer bank data, credit scores etc. The sources are mostly not the most modern, sometimes don’t even have connection points - like DWH systems, which are typically run with 90+% utilization.<br /><br />Here comes Blossom into the game. With Blossom we can connect to each of those systems (if desired and needed), and we can use already available data processing frameworks and engines like Spark, Kafka or Flink (and their commercial counterparts) without blowing up the engineering team.<br /><br />Now the fun part with Flower: we plug Flower to Blossom, and voilà - problem solved! The architecture could look like:<span id="docs-internal-guid-ebc7234c-7fff-d63f-431e-bfe20a2b9065"><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: Arial; font-size: 11pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><br /></span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9pcOLe04qEfXJ4dSZg6JH2XC8kW-UzluoIPi4nPzChZCbysxtRj_uX2ApRbV6vcPRs8IJIglHQI4YNw4ePUwmbaWHoMithmh4gWG-ilzBb89GdaxCJiS6Ep2txD57cus9KSyLREhl3LWf5BXpVhdsNOYEB_0IRkB_VOyd6VOISInF8nPlAfJ10tltNw/s4037/fl1png.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="2259" data-original-width="4037" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9pcOLe04qEfXJ4dSZg6JH2XC8kW-UzluoIPi4nPzChZCbysxtRj_uX2ApRbV6vcPRs8IJIglHQI4YNw4ePUwmbaWHoMithmh4gWG-ilzBb89GdaxCJiS6Ep2txD57cus9KSyLREhl3LWf5BXpVhdsNOYEB_0IRkB_VOyd6VOISInF8nPlAfJ10tltNw/w640-h358/fl1png.png" width="640" /></a></div><p></p></span><div><span><br /></span></div><div><span id="docs-internal-guid-3e935387-7fff-f67b-05b7-3618c8fe5727">To connect Blossom with Flower we just need a few lines of code:</span></div><div><span><br /><p style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;">import blossom as bls</span></span></p><p style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;">import flwr as fl</span></span></p><p style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;">import tensorflow as tf</span></span></p></span><p><span style="font-family: courier; font-size: 10pt; white-space: pre-wrap;">context = bls.context(env="federated")</span></p><span><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;">transactions = context.read("url to transaction") \</span></span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;"> .filter( transactionFilter )</span></span></p></span><span><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;">input_flower = context.read("url to customer table") \</span></span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;"> .filter( customerFilter ) \</span></span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;"> .join (transactions ) \</span></span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;"> .map ( convertToVector ) \</span></span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;"> .toNumpy()</span></span></p><p style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;">context.runFlower(</span></span></p><p style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;"> input_flower, \</span></span></p><p style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;"> server=fl.server.start_server("0.0.0.0:8080", config={"num_rounds": 3}) \</span></span></p><p style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;"> client=fl.client.start_numpy_client("0.0.0.0:8080", client=FlowerImplementedClient())</span></span></p><p style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;"> flowerEngine=tf</span></span></p><p style="line-height: 1.80001; margin-bottom: 3pt; margin-top: 3pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;">)</span></span></p><p style="line-height: 1.80001; margin-bottom: 3pt; margin-top: 3pt;"><span style="font-size: 10pt; font-variant-east-asian: normal; font-variant-position: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: courier;"><br /></span></span></p></span>We call this stack <b>Combined NLP Federated Data Services</b>. Flower takes care of the chatbot communication, the ML model and the execution with TF (Tensorflow) or any other supported ML framework, delivers the outcome to Blossom. Blossom now takes care of enriching the model with information from deeper backend systems and gives the output back to Flower, and Flower takes care of the next iteration with TensorFlow (TF). </div><div>This architecture is the backbone for an extensive NLP system using the best tools available for Federated Learning. This stack is future proof, both frameworks are built with pluggable extension support from the beginning. That means: whatever comes in the future, that stack can handle it. Even quantum computing AI training will be easily adoptable as a plugin.</div><div><br /><b>Conclusion:</b><br />To build cutting edge AI and machine learning / NLP stacks is not an area only the biggest data companies in the world can handle. With this approach we guarantee data sustainability, unmatched data privacy and enable digital transformation on a completely new level.<span><br /><br />[1] <a href="https://cacm.acm.org/magazines/2020/12/248796-federated-learning-for-privacy-preserving-ai/fulltext">https://cacm.acm.org/magazines/2020/12/248796-federated-learning-for-privacy-preserving-ai/fulltext</a><br />[2] <a href="https://www.blogger.com/#">https://wayang.apache.org/documentation.html</a><br />[3] <a href="https://github.com/adap/flower">https://github.com/adap/flower</a></span></div><div><span>[4] <a href="https://flower.dev/docs/implementing-strategies.html">https://flower.dev/docs/implementing-strategies.htm</a></span></div></div><div><br /></div><div>*** This post was originally published in our databloom.ai <a href="https://engineering.databloom.ai/2022/01/combined-federated-data-services-with.html" target="_blank"><b>blog</b></a> ***</div>Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com1tag:blogger.com,1999:blog-4654282947003145260.post-47130507215358761392021-09-29T15:08:00.054+02:002021-10-19T11:12:22.408+02:00Compile Apache Wayang on Mac M1 We release <a href="https://wayang.apache.org/" target="_blank">Apache Wayang</a> v0.6.0 in the next days, and during the release testing I was wondering if we get wayang on M1 (ARM) running. And yes, a few small changes - voila!<br /><br />Install maven, scala, sqlite and groovy:<br /><span style="font-family: courier;"> brew install maven scala groovy sqlite</span><div><span style="font-family: courier;"><br /></span>Download openJDK 8 for M1: <br /><a href="https://www.azul.com/downloads/?version=java-8-lts&os=macos&architecture=arm-64-bit&package=jdk" target="_blank">https://www.azul.com/downloads/?version=java-8-lts&os=macos&architecture=arm-64-bit&package=jdk</a> and install the pkg. <div><br /><div>Get Apache Wayang either from <a href="https://dist.apache.org/repos/dist/dev/wayang/" target="_blank">https://dist.apache.org/repos/dist/dev/wayang/</a>, or git-clone directly:</div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;">git clone https://github.com/apache/incubator-wayang.git</span></div><div><br /></div><div>Start the build process:</div><div><br /></div><div><span style="font-family: courier;">cd incubator-wayang</span><br /><span style="font-family: courier;">export JAVA_HOME=/Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home</span></div><div><span style="font-family: courier;"><br />mvn clean install</span></div></div><div><span style="font-family: courier;"><br /></span></div>Ready to go:<div><br /><div><span style="font-family: courier;"><span><div>[INFO] Reactor Summary for Apache Wayang 0.6.0-SNAPSHOT:</div><div>...</div></span><div>[INFO] BUILD SUCCESS</div><div>[INFO] ------------------------------------------------------------------------</div><div>[INFO] Total time: 06:24 min</div></span></div></div></div><div><span style="font-family: courier;"><br /></span></div>After the build is done the binaries are located in mavens home:<br /><span style="font-family: courier;">~/.m2/repository/org/apache/wayang</span><div><span style="font-family: courier;"><br /></span></div><div>Edit: After we pushed <a href="https://github.com/apache/incubator-wayang/commit/b8afb0c8c32f1bd6bd9b9f29416a697826be4f79#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8" target="_blank">https://github.com/apache/incubator-wayang/commit/b8afb0c8c32f1bd6bd9b9f29416a697826be4f79#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8</a> Apache Wayang works with JDK11 and Windows! Woot!</div>Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com4tag:blogger.com,1999:blog-4654282947003145260.post-74734855947611116292020-12-27T18:04:00.004+01:002020-12-27T18:04:27.078+01:00Stream IoT data to S3 - the simple way <span id="docs-internal-guid-e1e6a77f-7fff-ab8a-fd02-5d562ffe127d">First, a short introduction to <a href="https://www.blogger.com/blog/post/edit/4654282947003145260/7473485594761111629#"><b>infinimesh</b></a>, an Internet of Things (IoT) platform which runs completely in <a href="https://kubernetes.io/"><b>Kubernetes</b></a>: </span><div>infinimesh enables the seamless integration of the entire IoT ecosystem independently from any cloud technology or provider. infinimesh easily manages millions of devices in a compliant, secure, scalable and cost-efficient way without vendor lock-ins.<br />We released some <b><a href="https://github.com/infinimesh/plugins#plugins-and-connectors">plugins</a> </b>over the last weeks - a task we had on our roadmap for a while. Here is what we have so far:</div><div><br /><ul style="text-align: left;"><li><a href="https://github.com/infinimesh/plugins/blob/main/Elastic"><b>Elastic</b></a><br />Connect infinimesh IoT seamless into <a href="https://elastic.co/"><b>Elastic</b></a>.<br /><br /></li><li><a href="https://github.com/infinimesh/plugins/blob/main/timeseries"><b>Timeseries</b></a><br /><a href="https://oss.redislabs.com/redistimeseries/"><b>Redis-timeseries</b></a> with <b><a href="https://grafana.com/">Grafana</a> </b>for Time Series Analysis and rapid prototyping, can be used in production when configured as a Redis cluster and ready to be hosted via<b> <a href="https://redislabs.com/redis-enterprise-cloud/overview/">Redis-Cloud</a></b>.<br /><br /></li><li><a href="https://github.com/infinimesh/plugins/blob/main/SAPHana"><b>SAP Hana</b></a><br />All code to connect infinimesh IoT Platform to any <a href="https://www.sap.com/products/hana.html"><b>SAP Hana</b></a> instance<br /><br /></li><li><a href="https://github.com/infinimesh/plugins/blob/main/Snowflake"><b>Snowflake</b></a><br />All code to connect infinimesh IoT Platform to any <b><a href="https://www.snowflake.com/">Snowflake</a> </b>instance.<br /><br /></li><li><a href="https://github.com/infinimesh/plugins/blob/main/CloudConnect"><b>Cloud Connect</b></a><br />All code to connect infinimesh IoT Platform to Public Cloud Provider AWS, GCP and Azure. This plugin enables customers to use their own cloud infrastructure and extend infinimesh to other services, like <a href="https://www.scalytics.io/"><b>Scalytics</b></a>, using their own cloud native data pipelines and integration tools.</li></ul></div><div>We have chosen Docker as main technology, because it enables our customers to run their own plugins in their own space in their controlled environment. And, since our plugins don't consume so much resources, they fit perfectly into the free tiers of AWS EC2 - I use them here in that blog post. <br />The plugin repository was structured with developer friendliness in mind. All code is written in Go, and the configuration will be done on dockerfiles. Since you need to put credentials into, we highly advise to run the containers in a controlled and secure environment. </div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiEAviEZVD_fNecREvRpY2PUm00AqYOwekYgcgB1Yw163KeGVzJvXzJKSyiZkTccCNz-0aSwrJ2gZssbe3WAeeApgBfdD4mr_E9dUxiDXQMB_ewxvjqI6QLP3xyPKZyYGzw6XB15aHxKTv/" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img alt="infinimesh UI" data-original-height="1007" data-original-width="1043" height="386" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiEAviEZVD_fNecREvRpY2PUm00AqYOwekYgcgB1Yw163KeGVzJvXzJKSyiZkTccCNz-0aSwrJ2gZssbe3WAeeApgBfdD4mr_E9dUxiDXQMB_ewxvjqI6QLP3xyPKZyYGzw6XB15aHxKTv/w400-h386/image.png" title="infinimesh UI" width="400" /></a></div><h4 style="text-align: left;">Stream IoT data to S3</h4>Here I like to show how easy it is to combine IoT with already installed infrastructures in public clouds. The most used task we figured is the data stream to S3; most of our customers use S3 either directly with AWS, or by implementing their own object storage using the S3 protocol, like <a href="https://min.io/"><b>MinIO</b> </a>- which is also Kubernetes native.<br /><br />Of course a private installation of infinimesh or accounts on <a href="https://console.infinimesh.cloud/"><b>infinimesh.cloud</b></a> and AWS are needed, if using the cloud version of both. Here is a screenshot from the SMA device I used to write this post:</div><div><h4 style="text-align: left;">Preparation</h4><ol style="text-align: left;"><li>Spin up an EC2 instance in the free tier with Linux, a t2.micro instance should fit mostly all needs</li><li>Log into the VM and install docker as described in the AWS documentation: <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html"><b>Docker basics for Amazon ECS - Amazon Elastic Container Service</b></a></li><li>Install docker-compose and git:<br /><br /><span style="font-family: courier;">sudo curl -L \ <br />https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)\<br />-o /usr/local/bin/docker-compose \<br />&& sudo chmod +x /usr/local/bin/docker-compose \<br />&& sudo yum install git -y</span></li></ol></div><div><br />That’s all we need as preparation, now log-off and login again to enable the permissions we have set earlier. </div><div><h4 style="text-align: left;">Setup and Run</h4><ol style="text-align: left;"><li>Clone the plugin - repo:<br /><span style="font-family: courier;">git clone <a href="https://github.com/infinimesh/plugins.git"><b>https://github.com/infinimesh/plugins.git</b></a></span></li><li>Edit the <span style="font-family: courier;">CloudConnect/docker-compose.yml</span> and replace <span style="font-family: courier;">CHANGEME </span>with your credentials</li><li>Compose and start the connector (-d detaches from the console and let the containers run in background):<br /><span style="font-family: courier;">docker-compose -f CloudConnect/docker-compose.yml --project-directory . up --build -d</span></li><li>Check the container logs:<br /><span style="font-family: courier;">docker logs plugins_csvwriter_1 -f</span></li></ol>We used Go as development language, therefore the resource consumption is low:</div><div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh69_GSG6iSPXr8ICV_nDUpYDkTWLkLwXAeOuhwcGDjmPEHxXjh5Y6T9-hYBYK3RvDtRa0LtlvaNAQhZmk2aXm0cKZdMtUUJII8P1Jt7t9_J8e3D0TigfnJN3KN9lb0tod3pSUhyphenhyphenhJtJvt4/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="81" data-original-width="1510" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh69_GSG6iSPXr8ICV_nDUpYDkTWLkLwXAeOuhwcGDjmPEHxXjh5Y6T9-hYBYK3RvDtRa0LtlvaNAQhZmk2aXm0cKZdMtUUJII8P1Jt7t9_J8e3D0TigfnJN3KN9lb0tod3pSUhyphenhyphenhJtJvt4/s16000/image.png" /></a></div><br /></div><div>After one minute the first CSV file should be arriving in S3. That’s all - easy and straightforward.</div><div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwVJB5KIoQoANUSUcZ0K-uMvbgLgwRqRTR7D4zvkHdhUGt9QT-Mvyx53QCbj62MGQlC9avGQykD7zKyaLz11UV4C-7g0ltY6c1dGNHUJgYh6ieLxFgmfGQyf3x1inugqDW246nO4PR4QWd/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="398" data-original-width="991" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwVJB5KIoQoANUSUcZ0K-uMvbgLgwRqRTR7D4zvkHdhUGt9QT-Mvyx53QCbj62MGQlC9avGQykD7zKyaLz11UV4C-7g0ltY6c1dGNHUJgYh6ieLxFgmfGQyf3x1inugqDW246nO4PR4QWd/s16000/image.png" /></a></div><br /><br /><h4 style="text-align: left;"> Some developer internals</h4>We have built some magic around to make the use of our plugins as easy as possible for customers and at the same time easy to adapt for developers.</div><div> <p style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><b>How it works:</b></p>First we iterate over /objects, finding all endpoints marked with [<span style="font-family: courier;">device</span>], call the API for each device and store the data as a sliding window into a local redis store, to buffer network latency. After some seconds we send the captured data as CSV to the desired endpoints. In our tests we transported data from up to 2 Million IoT devices over this plugin, each of those devices send every 15 seconds ten key:value pairs as JSON. <br /></div>Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com4tag:blogger.com,1999:blog-4654282947003145260.post-25465244187849607382020-04-16T08:44:00.000+02:002020-04-16T09:49:15.856+02:00Embedded Linux won't reboot - how to fix and repair I have a lot of embedded systems running in our lab or in my home, all of them either as Raspberries or selfmade PCB with Yocto. Sometimes I can't reboot some systems, I think its the <span style="font-family: "courier new" , "courier" , monospace;">journald </span>which causes some issues with SSD Cards, the error-message usually is:<br />
<span style="font-family: "courier new" , "courier" , monospace;">Failed to open /dev/initctl</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
Anyhow, if you have this issue - a reboot can be force-forced:<br />
<span style="font-family: "courier new" , "courier" , monospace;">systemctl --force --force reboot</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: inherit;">Since a forced reboot does not sync the journal, the system typically comes up with a damaged FS. The remote fsck can be initiated by extending the command above with</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">sudo tune2fs -i 1m /dev/DISK && touch /forcefsck && systemctl --force --force reboot</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: inherit;">(assumed you have access to a shell, via SSH or local access). When all goes fine, the system comes up with a clean FS. All this fuss comes from the SSD r/w actions, a well designed IoT embedded devices should have a flash mem part for the running OS.</span>Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com6tag:blogger.com,1999:blog-4654282947003145260.post-80914622498757908742020-03-30T16:24:00.001+02:002020-04-16T09:49:48.652+02:00How to fix "Problem with MergeList /var/lib/apt/lists/"<div>
Sometimes, and more often, an update on RaspberryPi hangs with: "Problem with MergeList ..."</div>
<div>
<br /></div>
<span style="font-family: "courier new" , "courier" , monospace;">sudo mv /var/lib/apt/extended_states /var/lib/apt/extended_states_tmp && rm -rf /var/lib/apt/lists/* -vf && sudo apt-get update</span><br />
<div>
<br /></div>
<div>
solves the issue.<br />
<div>
<br /></div>
<div>
<br /></div>
</div>
Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com0tag:blogger.com,1999:blog-4654282947003145260.post-60162326485286451212020-03-08T09:47:00.003+01:002023-06-14T15:43:29.192+02:00Kubernetes - delete all pods with a certain status <br /><br />"How to delete all pods with a status like Error" - that is a question I often get. The DevOps answer is a one-liner, filtering by status.reason and the status we want to query, es example:<br /><br /><span style="font-family: Courier New, Courier, monospace;">kubectl get po -A --all-namespaces -o json | jq '.items[] | select(.status.reason!=null) | select(.status.reason | contains("Evicted")) | "kubectl delete po \(.metadata.name) -n \(.metadata.namespace)"' | xargs -n 1 bash -c</span><div>
<br /></div>
<div>
In that case we delete all pods with the Status "Evicted". But it works for every status based filtering. </div>
Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com1tag:blogger.com,1999:blog-4654282947003145260.post-51048176805370667202020-01-04T10:23:00.000+01:002020-01-04T10:25:01.471+01:00Why I ditched my MacBook and switched to Windows 10I have to admit, I'm an Apple guy. I had everything, from Pixel books over Pixel phones, Blackberry, One Plus, Google Home, Amazon Fire, Alexa, Surface - but at the end I always ended up in Apple's universe. Apple TV, Home Pod, iPhone, Mac Book, Watch, Earphones.<br />
Hey, not Mac Book anymore. That was my most annoying experience with Apple - a 2 year old MBP 15inch max configuration, worth 3.8k EUR, got the flicker bug. I wrote about in a blog post before. Apple did not recognize that as guarantee, so I sold that over eBay and went to Dell XPS. Long story short.<br />
<br />
Now, as a windows hater and open source follower I tinkered 2 weeks or so to get Linux running, ended up with Ubuntu and POP!OS additions. Nice working horse, but - the battery life was not a life, it was a short draining nightmare. Even in standby, lid closed, the battery was empty after one night. That can Apple much better, I have to admit. So I googled, read Reddit and HackerNews and decided yesterday evening to ditch Linux and install Windows. Why? One thing - <a href="https://github.com/microsoft/terminal" target="_blank">Windows Terminal</a> (Preview), a iTerm2 for Windows.<br />
I used Windows 10 often, often I switched and worked with a few month. The main thing I got bored about Win10 was the Terminal emulator, the non-working mail and calendar app, the useless browser and the colored and useless boxes in the Start Menu.<br />
That changed in the last week. Now, together with Terminal and the on-board mail and calendar tools (not Outlook, I use Windows Mail and Calendar, because they work with Google Suite), Dark Theme, Visual Studio Code (my favorite code editor since a year or so, works on Linux as well OSX), Edge Beta (the chromium thing) and the seamless integration of my iPhone with Dell Mobile Connect I have a machine which looks better than a Mac Book, has much better integration and is with 1.5k EUR much cheaper than a MBP - and all the new stuff is Open Source!<br />
All my coding stuff works as hell, I installed GCP and go in my Linux subsystem, tuned Terminal with opaque background, Linux as default shell and starting in my Linux home directory. My Terminal config is in my <a href="https://github.com/2pk03/shell/blob/master/terminal2-windows.json" target="_blank">github repo</a>.<br />
<br />
At the end, I think I will stick a while with that config. The screen is brilliant, and Dell's Hardware Support is awesome.Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com1tag:blogger.com,1999:blog-4654282947003145260.post-37874668605758449292019-11-30T13:28:00.000+01:002019-11-30T13:28:40.112+01:00Dell XP 13 7390 late 2019 and LinuxSince my 3 years old Macbook Pro 15inch got the <a href="https://bit.ly/2R5Jvo8" target="_blank">flickering</a> bug I decided to buy a Dell laptop. Why? I have a really old Dell laptop from 2004! which works perfectly as a OpenBSD firewall and internet router in my weekend house.<br />
<br />
After I got my machine, first thing was to install PopOS. Or better, I <a href="https://bit.ly/2DwDuJk" target="_blank">tried</a>. It never worked, I think it's a bug in the loader section of PopOS, a <a href="https://github.com/pop-os/pop/issues/760" target="_blank">bugreport</a> is open. After some hours of hacking and fixing I got a bit bored and used a Ubuntu installation, which worked out of the box perfectly. Yes I know I had could bought the XPS at Dell with Ubuntu on it. But I did not want to ;)<br />
<br />
Here now some tricks to get the XPS smooth to work with - right now I have the book attached to my curved 4k monitor and code some infrastructure stuff for <a href="https://github.com/infinimesh/infinimesh" target="_blank">infinimesh</a>.<br />
<br />
1. Bluetooth mouse<br />
First time a BT mouse is connected it lags. Moving the mouse, the pointer follows after seconds, slow and unresponsive. Fix by <span style="font-family: "courier new" , "courier" , monospace;">sudo vi /var/lib/bluetooth/<MAC Adapter>/<MAC mouse></span>:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">[ConnectionParameters]<br />MinInterval=6<br />MaxInterval=9<br />Latency=44<br />Timeout=216</span><br />
<br />
<span style="font-family: inherit;">Reboot and the mouse works like a charm</span><br />
<br />
<span style="font-family: inherit;">2. Power Saving</span><br />
<span style="font-family: inherit;">I'm used to close the laptop lid and the systems get into sleep. Apple perfectionized that. Per default, closing the lid just switched the display off, which is not a powersave mode. To have a more proper powermanagement install <span style="font-family: "courier new" , "courier" , monospace;">tlp</span> per: <span style="font-family: "courier new" , "courier" , monospace;">sudo apt-get install tlp<span style="font-family: inherit;">.<span style="font-family: inherit;"> </span></span></span></span>The default rules are pretty fine, but can be tweaked per <span style="font-family: "courier new" , "courier" , monospace;">vi /etc/default/tlp</span><br />
<br />
3. UI<br />
Ya, what to say. I don't like Ubuntu's visual artwork. Thats why I wanted to install PopOS. But my friends from <a href="https://system76.com/" target="_blank">System76</a>, the maker of PopOS, have a nice <a href="https://pop.system76.com/docs/install-pop-theme/" target="_blank">blogpost</a> how to get the most of PopOS into a vanilla Ubuntu.<br />
<br />
4. Software<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
I tested a lot of mail clients, calendars and other collaboration tools and ended up, I'd say as usual, with Evolution. Works well with Google, Outlook, NextCloud calendars and mail systems like mail-in-a-box. To get an unified inbox in Evolution just create a search folder, name it Unified Inbox and search for read and unread messages in all accounts. Voila, unified inbox. <br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOEDtt4WtgaXvY7JE_QbDiBvV7MGg44bwxBDprOwQuUefgsZPjdachjOg6ClIgeuylilkqtiblxBFDxh643jsUoPCCG791w9FmtIiCGo42T-W3M73sUeQH18_eBk71DsVyaDsh0Nduo9MW/s1600/Screenshot+from+2019-11-30+12-35-02.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="496" height="239" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOEDtt4WtgaXvY7JE_QbDiBvV7MGg44bwxBDprOwQuUefgsZPjdachjOg6ClIgeuylilkqtiblxBFDxh643jsUoPCCG791w9FmtIiCGo42T-W3M73sUeQH18_eBk71DsVyaDsh0Nduo9MW/s320/Screenshot+from+2019-11-30+12-35-02.png" width="320" /></a></div>
<br />
I also use Visual Studio Code, Slack, WhatsApp Desk, Solaar and oh-my-zsh of course. For PDF signing I use Xournal - its not so easy as with OSX Preview.app, but works. Gnome's Evince will have some support in future, too.<br />
<br />
To an end, the XPS is a more capable MacBook as the original from Apple. With Linux and some tweaks the systems runs more stable and smooth as my 243% more expensive MBP, has a modern look and feel and all the tools I'm used too. Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com0tag:blogger.com,1999:blog-4654282947003145260.post-69034120288762999592019-11-07T12:11:00.004+01:002023-06-14T15:47:13.243+02:00Kubernetes - on which node runs what?Just had to find the node for a dedicated pod, found this line useful:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">kubectl get pod -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName --all-namespaces</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com1tag:blogger.com,1999:blog-4654282947003145260.post-21183161373337868862019-10-21T17:46:00.001+02:002019-10-21T18:49:57.240+02:00OSX - iCloud accounts broken after update to CatalinaYay - that was happen to me. Magically I had two iCloud accounts, both not working - and no way to delete one of them via Settings -> Internet Accounts. After a few hours of googling, debugging I got it working again. And as usual I'd like to share that solution, I think I'm not the only one with that problem ;)<br />
Be aware, that's a bomb solution, means it will reset all internet accounts - but also unfreeze all the broken stuff:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">sudo -v ; killall -9 accountsd com.apple.iCloudHelper ; defaults delete MobileMeAccounts ; mkdir ~/Library/Accounts/Backup ; mv ~/Library/Accounts/*.sqlite* ~/Library/Accounts/Backup/ ; killall -9 accountsd com.apple.iCloudHelper ; sudo reboot</span><br />
<br />
So, why does that happen? Apple uses sqlite as backend for storing the informations, and sometimes this single instance get corrupted. Have fun!Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com1tag:blogger.com,1999:blog-4654282947003145260.post-23433679544437440432019-04-29T19:31:00.000+02:002019-04-29T19:31:12.364+02:00Enabling IoT to establish a sustainable value chain<div>
I wrote an article for CIO Applications, here's archive of it:</div>
<div>
<br /></div>
IoT devices are getting more and more intelligent and can now create meshed networks by itself, switching from a sensor into an actor and transferring informations only for the meshed neighbors. For example a connected car could tell the future home that the homeowner will be at home in 5 minutes and the garage door and the door need to be unlocked in time, the lights need to be switched on and the grid operator needs to be informed that the wallbox now charges with 22KW. In near future this will happen over direct meshed information cells, operated by always connected devices, wearables, sensors, actors, mobile devices - short: everything. And all cloud provider offer dozens of solution to master the challenges, on the one, other or complete different way. <br /><br />Self-organizing mesh networking and communication comes with a permanent flow of information, massive IoT data streams even classic Big Data frameworks like Hadoop cannot handle anymore in time. Coming along with the art of data, the need for data processing changes with the kind of data creation and ingestion. Most analyses will be done on the edge and during the ingestion stream when the data comes to rest. The data lake should be the central core to store data, but the data needs to get categorized and catalogued together with a proper and well defined schema and data description. The intended use of the gravity such data pools generate needs to be applied as the motor of data driven innovation. <br /><br />Why? Batched processing helps to predict getting value out of stored data even by analyzing multiple other data points and storage facilities, but not to react in time. And timely information in IoT enables business processes only to have a valuable meaning at the time they occur, to do the job stream processing frameworks like Spark or Kafka are more suitable. Combining both techniques brings unmatched value and impact to the business, driven by the right use of data. Stream processing during the data transportation closes the gap between rapid data and data on rest. Mostly referring to the more costly IoT at edge computing, MQTT enabled stream processing engines deliver high throughput over all kind of compute instances, be it in a local data center, hybrid clouds or in public clouds. <br /><br />The same is countable for available cloud technology. Every cloud provider has his own IoT solution zoo with his own lock-ins, but often they do not fit to scaling plans either in complexity, missing or not well implemented parts or simply the price model is not comparable to the margin getting from an IoT based product. A combined approach of scalable cloud technology (which fits most) and own development brings the most benefit at an affordable price tag, unspoken of the intellectual property a business gains and holds, instead to bring this to providers and therefore competitors. Independent organisations like “Linux Foundation Edge” provide the most useful insight over Open Source projects and initiatives. <br /><br />Just dumping data somehow without visions behind does not help to solve the problems companies face on their digital journey, especially when it comes to questions of revenue from IoT projects. Big Data needs to have a nearly perfect data management, data rights and data retention process behind. Only this offers the possibilities to get full advantage of any kind of data, to open new revenues and sales streams and to finally see all data driven activity not as a cost saving project (as the most agencies and vendors promise) but as a revenue creation project. Using modern cloud technologies moves organizations into the data centric world, focusing on business and not operations. <br /><br />Analyzing the data is the more tricky part here - on the one hand every data point brings valuable input, but on the other hand the unlimited data store also brings vulnerabilities to customer insights. I am a bit concerned about 360 degrees approaches. At first the value part of data collections needs to be questioned: which data is system relevant for support, maintenance or emergency and which is important to generate a sustainable revenue. Using streaming analysis gives valuable input at the point in time the information is needed to make decisions, but also gives the possibility to route data into different data stores. It is always unquestionable that the value of customers is higher than the data gathered, implementing a state-of-the-art data ethic catalogue is one of the main tasks analytics needs to cover. <br /><br />We move quickly to a so-called interconnected world, always connected systems will dominate our future lives, introducing new business models by combining business areas which were not even in the range of combined business models. The future CIO needs to know what implications the data has, what uncountable values this data can generate but also to weight what threats uncontrollable data collections can cause. Building new data driven business will be the most exciting job in future, things never done before are now possible. Embrace this. <div>
<br /></div>
The article can be read online: <div>
<a href="https://cloud.cioapplicationseurope.com/cxoinsights/enabling-iot-to-establish-a-sustainable-value-chain-nid-885.html">https://cloud.cioapplicationseurope.com/cxoinsights/enabling-iot-to-establish-a-sustainable-value-chain-nid-885.html</a></div>
Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com4tag:blogger.com,1999:blog-4654282947003145260.post-42293754221466206352019-03-14T19:32:00.000+01:002019-04-29T20:00:58.477+02:00Infinimesh IoT / IIoT platform is starting up!Today is a day we will never forget - infinimesh (<a href="https://www.infinimesh.io/">https://www.infinimesh.io/</a>) is starting and lifting off! Our Kubernetes, Apache Kafka ® and graph based Industrial IoT platform is entering the alpha stage! We have been working like maniacs over the past 14 months to bring a fully flexible, independent, patent and vendor lock-in free IoT platform to you! Soon it’s your chance to test and try it out, our closed alpha will be open for public on March 30, 2019 - Mark this date in your calendar!<div>
<br /><h3>
An incredible platform comes to life</h3>
We believe smart and connected devices bring our society forward. Smart technology uses resources only when they are really necessary and thus prevents waste. On the other hand, when really required, smart things act and hence prevent accidents or simply enable a great user experience. We have started infinimesh 100% Open Source, without patents or closed software. Any software components we have developed, and to this we commit going forward, will be open - forever. Founded by engineers who built the backbone of the European Energy Revolution, infinimesh aims to make industrial and individual IoT secure, available and affordable for all. Infinimesh runs in all cloud offerings, be it public, hybrid or private. All you need is Linux; our platform works in any container environment as well as native.<div>
<br /></div>
<h3>
Infinimesh on Google Cloud</h3>
We have selected Google Cloud as strategic partner for our SaaS offering - and from today on the platform is running on GCP! Our SaaS offering, running in Google Cloud, is free for everybody up to 25 devices - ideally for makers, startups and industrial Proof of Concepts. That leaves enough room to bring ideas to live and test even larger installations and use the feature rich ecosystem of GCP to make your idea a successful product.<div>
<span style="color: #666666; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 16px;"><br /></span></div>
<h3>
What can I do with infimesh IoT on GCP right now?</h3>
<ul>
<li>Connect devices securely via MQTT 3.1.1</li>
<li>Transfer desired and reported device states</li>
<li>Manage accounts (Create/Delete)</li>
<li>Manage Namespaces to organize devices and restrict access to devices</li>
<li>Create hierarchically organized objects, e.g. buildings, rooms to organize and model device hierarchies</li>
</ul>
<div>
<br /></div>
<h3>
How does it work?</h3>
Our Kubernetes Operator does the work a real operator would do: it not only installs the whole platform, but also takes care of required cloud/datacenter resources, updates, monitoring and handles incidents like errors. It attempts to resolve as many issues as possible on its own, and notifies human operators when human intervention is required. The operator is the glue between infinimesh and the target installation environment. Our alpha drop focuses on Google Cloud Platform and enables exactly this environment. More supported environments will follow.</div>
<div>
<br /><h3>
We build features for industrial IoT</h3>
<div>
<br /></div>
<h4>
Device Management</h4>
Powerful but simple framework to visualize clusters of devices within your organization and set permissions up to device level.</div>
<div>
<br /><h4>
Device Shadow</h4>
<div>
Real-time and two-way correspondence for every device in your fleet. Our highly-scalable backend can power millions of devices.</div>
</div>
<div>
<br /><h4>
Timeseries Visualization</h4>
Great telemetry is based on timeseries. infinimesh has timeseries data capabilities built-in and enables meaningful monitoring.</div>
<div>
<br /><h4>
Virtual Twins</h4>
A virtual twin is the digital copy of your physical asset. infinimesh provides virtual twins which give you the possibility to modify your physical device without even touching it<br />Machine Learning and AI.<br /><br /><h4>
Intelligence</h4>
<a href="https://www.infinimesh.io/" target="_blank">infinimesh </a>has Machine Learning and Artificial Intelligence models built-in to rapidly detect anomalies and respond accordingly.<div>
<br /></div>
<h3>
Roadmap and features ahead</h3>
<div>
<br /></div>
<h4>
OPC-UA with full open62541 support (binary protocol with encryption) and BACnet will be available within the next quarter.</h4>
OPC-UA is the leading semantic protocol for industry 4.0 and opens the full potential to industry proven stacks like Siemens MindSphere and IBM Watson for Industry. BACnet will also make its way into the platform quite soon, we expect a first drop in the next couple of weeks. BACnet is the most used communications protocol for Building Automation and Control (BAC) networks that leverage the ASHRAE, ANSI and ISO 16484-5 standard protocol and is used in various intelligent buildings as protocol stack.</div>
<div>
<br /><h4>
What’s next?</h4>
More exciting news and announcements will follow in the next months, so use the platform and follow this blog or our channels to never miss news. We are happy to have you as user and customer and we will support you in any idea you have. Drop us a mail, or open a Feature Request (<a href="https://github.com/infinimesh/infinimesh/blob/master/.github/ISSUE_TEMPLATE/feature_request.md">https://github.com/infinimesh/infinimesh/blob/master/.github/ISSUE_TEMPLATE/feature_request.md</a>) or contact (<a href="https://infinimesh.io/contact.html#contact">https://infinimesh.io/contact.html#contact</a>) us over our different channels - we are here.</div>
Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com0tag:blogger.com,1999:blog-4654282947003145260.post-58810801657198175812017-11-22T13:48:00.000+01:002017-11-22T13:50:09.459+01:00Next Internet comes with IoT The Internet we know is a great space for collaboration, social media and gaming. But when it comes to business or transactions, the power belongs to few big ones. Remember the S3 outage and half of the north-american services where offline? Or the Dny hack which kicked out half of the internet for hours? The next internet could be a blockchain based independent network, using as many protocols as available and there is no one person in control of it and it is run on the Internet.<br />
<br />
<div>
<i>In a nutshell, Blockchain is a decentralized system in which every transaction gets mathematically approved by the members of the system, therefore every member of that transaction knows about it. The information of the transaction is stored in the distributed servers of the blockchain. That makes manipulations highly impossible, and the transaction is also highly available at every time.</i><br />
<i><br /></i>
IoT devices are getting more and more intelligent and can now create meshed networks by itself, switching from a sensor into an actor and transferring informations only for the neighbors. For example to tell the doorknob that the Homeowner will be at home in 5 minutes with his EV, and the Wallbox and the door needs to be unlocked. Right now that is possible by IFTTT, which is an extra protocol and needs manual configuration, in future this will happen over direct meshed information cells automatically, inclusive status updates.<br />
<br />
When we now look into the power of billions of IoT devices, may it be sensors, cameras, windmills, cars or whatever, as basis they all carry CPU and memory. Connecting all of those together combines a large, highly available inter-connected system. Always on, always accessible, always responsible self connected things which share informations about their environment with other things by itself and trigger automated actions, learned by the behavior of the things’ environmental space. Thinking as an ultrawide available blockchain, those devices will be the next internet. Transactions, informations and data will be stored securely on a device and every device, connected to another device, will automatically become a member of the global blockchain pool in the future. That brings the power of blockchain to an always connected network, speeding up the digital disruption every business has and allows enterprises to build models based on the decentralized network. Right now, without an economic virtual entity to establish each other's identity, over 2 billion of humans are excluded from being a part of any financial transaction globally and let others collect data about yourself, steal identities and commit fraud without letting us a chance to fight against. Those who have the power and control large parts if the Net can’t be disempowered, because they operate large parts of the Net, too.<br />
<br />
That mistake can and will be solved by the next Internet, bringing in radical and new solutions for the Internet we know. Most of them are based on Blockchain Technology, like Ethereum provides for Smart Contracts. <br />
<br />
Another technology move could be blockchain powered AI, immutable, shared decentral control, trusted audit trails leads to qualitative better data and algorithms through more data available. Since real-world modeling works on large volumes of data, such as training on large datasets or high-throughput stream processing systems. For applications of blockchain to AI, blockchain technology with big-data scalability and querying like the groundbreaking BigchainDB with the public IPDB are needed. And a global scaled blockchain unlocks new large-scale opportunities starting from better model training though model sharing over a shared global AI model registry to automated wealth for our planet. </div>
Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com0tag:blogger.com,1999:blog-4654282947003145260.post-51588409076620386722017-06-16T09:49:00.005+02:002017-06-16T09:50:18.047+02:00The Machine and BigDataHP’s „The Machine“ (1) project is in my eyes the most advanced in the IT world with the simple goal to rethink the entire computer design. And the plan is ambitious – the first edge devices shall be ready in 2018, industrialized series in 2020. <br />
<br />
Will “The Machine” really revolutionize an entire industry mostly influenced by IBM? Let’s say it could and probably will with a high percentage of success. <br />
Based on the idea of Memristor (2) the project uses memory based technology to store data. Nothing new here. New is the non-volatile usage. Data, stored in an Memristor, persists unless the storing bit gets cleaned and new aligned. Now, NVRRAM (non-volatile resistive RAM) it’s faster as volatile DDR4 modules (which they use at the moment until Western Digital can deliver NVRRAM modules) and factor 100x faster than current state-of-the-art SSD based technologies. The newest prototype has 40 nodes with approx. 160 TB DDR4-RAM and 1,280 Cores connected with X1 PM’s (Photonic Modules). Means: pretty fast. Anyhow, just follow the appendix (1) to get more interesting engineering facts. <br />
The most important consideration is the pure permanent all-integrated storage itself. The part of attached storage (like HDFS, GFS, Ceph) would simply disappear and directly merge with the computation layer. The principle “local data first” will surely be a part of any fine-tuning approach but with the high density of storage that will not really matter. All pieces of computation will be at the same place (cache, volatile and permanent storage combined with fast caching) and work as one homogenous entity which can hold every state of every piece of data during the whole computation lifecycle. <br />
I just want to consider the changing fundamentals of that idea and what that would mean to data processing. The first big difference – a trinity memristor can store 10 bits instead of 8 today. That means simply a 3 times higher data storage density than today. Additionally, the highly volatile cache a CPU uses during the calculation process will be stored permanently which allows following processes to reuse the pre-calculated subsets and that would speed up any calculation dramatically. As for example in pattern detection algorithms like MCMC (3) could highly benefit simply by picking up the already calculated subset and use it in a new chain which would revolutionize data intelligence in terms of speed and tree generation. I think thats an huge step into the AI world - ultrafast learning algorithms helping the mankind to operate high sensitive environments like deep- space flights, connected cars, CEP networks or decentral power grids. <br />
<br />
(1) https://www.labs.hpe.com/the-machine<br />
(2) <a href="http://en.wikipedia.org/wiki/Memristor">http://en.wikipedia.org/wiki/Memristor</a><br />
(3) https://en.wikipedia.org/wiki/Markov_chain_Monte_CarloAlexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com0tag:blogger.com,1999:blog-4654282947003145260.post-56458871227904336752017-05-09T20:19:00.002+02:002017-05-09T20:19:57.828+02:00The next stage of BigDataRight now, the terms BigData and Hadoop are used as one and the same - often like the buzzword of buzzwords. And they sound mostly as a last time call, often made by agencies to convince people to start the Hadoop journey before the train leaves the station. Don’t fall into that trap.<br /><br />Hadoop was made by people who worked in the early internet industry, namely Yahoo. They crawled millions of millions web pages every day, but had no system to really get benefit from this information. Dug Cutting created Hadoop, a Map/Reduce framework written in Java and blueprinted by Google in 2004 (1). The main purpose was to work effectively with an ultra-large set of data and group them by topics (just to simplify). <br />Hadoop is now 10 years old. And in these 10 years the gravity of data management, wrangling and analyzing runs faster and faster. New approaches, tools and techniques emerging every day in the brain centered areas called Something-Valley. All of those targeting the way we work and think with data. <br /><br />That describes the main problem of Hadoop itself – it’s designed as an inner working system, providing storage and computation layer at once. And that’s why Hadoop Distributions typically are suggesting to use BareMetal installations in a Datacenter and push companies to create the next silo'd world, promising the good end after leave another one (separate DWH’s without connection between each other). That comes with dramatic costs, operations and a workforce of highly trained engineers, among high costs of connecting systems on premise to the new silo'd DataLake approach, often mixed up with lift-and-shift operations. And here arises the next big problem described as “data gravity”. Data simply sinks down the lake until nobody can even remember what kind of data that was and how the analytical part can be done. And here the Hadoop journey mostly ends. A third issue comes up, driven by agencies to convince companies to invest into Hadoop and Hardware. The talent war. In the end it simply creates the next closed world, but now named a bit fancier.<br /><br /><div>
The world spins further, right now in the direction public cloud, but targeting device edge computing (IoT) and DCC (DataCenter on a chip). Additionally, the kind of data changes dramatically from large chunks of data (PB on stored files from archives, crawler, logfiles) to streamed data delivered by millions of millions edge computing devices. Just dumping data in a lake without visions behind getting cheap storage doesn’t help to solve the problems companies face in their digital journey. <br /><br />Coming along with the art of data, the need for data analyzing changes with the kind of data creation and ingestion. The first analysis will be done on the edge, the second during the ingestion stream and the next one(s) when the data comes to rest. The DataLake is the central core and will be the final endpoint to store data, but the data needs to get categorized and catalogued during the stream analytics and stored with a schema and data description. The key point in a so-called Zeta-Architecture is the independence of each tool, the “slice it down” approach. The fundamental basic is the data centered business around a data lake, but the choice of tools getting data to the lake, analyze and visualize them aren’t written in stone and independent from the central core. <br /><br />That opens the possibilities to really get advantage of any kind of data, to open new revenues and sales streams and to finally see all data driven activity not as a cost saving project (as the most agencies and vendors promise) but as a revenue creation project. Using modern cloud technologies moves organizations into the data centric world, focusing on business and not operations. <br /><br />(1) <a href="https://research.google.com/archive/mapreduce.html" target="_blank"><b>https://research.google.com/archive/mapreduce.html</b></a><br /><br /></div>
Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com1tag:blogger.com,1999:blog-4654282947003145260.post-39173645732548938662016-12-08T09:30:00.004+01:002016-12-11T17:46:09.711+01:00Hue 3.11 with HDP 2.5Works fine with CentOS / RHEL, I used 6.8 in that case. Epel has to be available, if not, install the repo.<br />
And I ask me why Hortonworks didn't integrated Hue v3 in their HDP release - I mean, Hue v2 is older as old and lacks dramatically on functionality.<br />
Anyhow, lets get to work.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">sudo wget <a href="http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo">http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo</a> -O /etc/yum.repos.d/epel-apache-maven.repo</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">sudo yum install ant gcc krb5-devel mysql mysql-devel openssl-devel cyrus-sasl-devel cyrus-sasl-gssapi sqlite-devel libtidy libxml2-devel libxslt-devel openldap-devel python-devel python-simplejson python-setuptools rsync gcc-c++ saslwrapper-devel libffi-devel gmp-devel apache-maven</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">sudo mkdir /software; sudo chown hue: /software && cd /software<br />wget <a href="https://github.com/cloudera/hue/archive/master.zip">https://github.com/cloudera/hue/archive/master.zip</a> -O hue.zip && unzip hue.zip; cd hue-master; sudo mkdir -p /usr/local/hue && chown -R hue: /usr/local/hue && make install</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<br />
<h4>
HDP config changes:</h4>
<div>
<br /></div>
<div>
<b>Oozie => Custom oozie-site</b><br />
<div>
<span style="font-family: "courier new" , "courier" , monospace;">oozie.service.ProxyUserService.proxyuser.hue.groups *<br />oozie.service.ProxyUserService.proxyuser.hue.hosts *</span><br />
<br />
<b>Hive => Custom webhcat-site</b><br />
<span style="font-family: "courier new" , "courier" , monospace;">webhcat.proxyuser.hue.host *<br />webhcat.proxyuser.hue.groups *</span><br />
<br />
<b>HDFS => Custom core-site</b><br />
<span style="font-family: "courier new" , "courier" , monospace;">hadoop.proxyuser.hue.hosts *<br />hadoop.proxyuser.hue.groups *</span><br />
<br />
At the end, hue.ini needs to be configured to fit the installation, here's an example - I use 8899 as HueUI port:<br />
<div>
<br />
<div>
<span style="font-family: "courier new" , "courier" , monospace;">http_port=8899<br />app_blacklist=impala,security<br />hive_server_host=hue3.azure.intern<br />hbase_clusters=(Cluster|hue3.azure.intern:9090)<br />solr_url=<a href="http://hue3.azure.intern:8983/solr/">http://hue3.azure.intern:8983/solr/</a></span><br />
<br />
At least a new security rule for port 8899 has to be created, as well as the hbase thrift service has to be started per:<br />
<span style="font-family: "courier new" , "courier" , monospace;">nohup hbase thrift start &</span></div>
<div>
<br /></div>
<div>
Configure Hue:</div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">/usr/local/hue/build/env/bin/hue syncdb</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">/usr/local/hue/build/env/bin/hue migrate</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span></div>
<div>
Start Hue:</div>
<span style="font-family: "courier new" , "courier" , monospace;">/usr/local/hue/build/env/bin/supervisor -d</span><br />
<div>
<br />
Login per http://your_hue3_host:8899<br />
<br /></div>
<div>
I strongly recommend to use MySQL as an backend DB, but for first test the integrated SQLite instance is fine, too.</div>
<div>
<br /></div>
<div>
Issues:<br />
<a href="https://issues.cloudera.org/browse/HUE-4701" target="_blank"><b>HUE-4701</b></a> - recreate the saved queries from sample notebook</div>
</div>
</div>
</div>
Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com1tag:blogger.com,1999:blog-4654282947003145260.post-10096892831271050912016-11-29T10:40:00.001+01:002016-12-02T10:07:33.354+01:00Erase HDP 2.x and AmbariSince I hack now often with Hortonworks HDP, I also often need to completely clean out my lab environments to get fresh boxes. I figured to write a ugly shell script is more comfortable as bothering my infra guys to reset the VM's in Azure - which also reset all my modifications. Bad!<br />
Anyhow, here's the script in the case anyone has some use, too.<br />
<br />
<a href="https://github.com/alo-alt/shell/blob/master/rmhdp.bash" target="_blank"><b>https://github.com/alo-alt/shell/blob/master/rmhdp.bash</b></a><br />
<br />
As usual, first stop all Ambari managed services. I remove Postgres too, since the setup of a new db done by the installer of Ambari is much more faster than dealing with inconsistencies later.<br />
Side Note: The script is made for RHEL based distributions ;)Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com0tag:blogger.com,1999:blog-4654282947003145260.post-90391224221359123062016-10-24T10:11:00.000+02:002016-10-24T10:11:05.091+02:00FreeIPA and Hadoop Distributions (HDP / CDH)<a href="https://www.freeipa.org/page/Main_Page" target="_blank"><b>FreeIPA</b> </a>is the tool of choice when it comes to implement a security architecture from the scratch today. I don't need to praise the advantages of FreeIPA, it speaks for himself. It's the Swiss knife of user authentication, authorization and compliance.<br />
<br />
To implement FreeIPA into Hadoop distributions like Hortonwork's HDP and Cloudera's CDH some tweaks are necessary, but the outcome is it worth. I assume that the FreeIPA server setup is done and the client tools are distributed. If not, the guide from Hortonworks has those steps included, too.<br />
<br />
For <a href="http://hortonworks.com/" target="_blank"><b>Hortonworks</b></a>, nothing more as the link to the documentation is necessary:<br />
<a href="https://community.hortonworks.com/articles/59645/ambari-24-kerberos-with-freeipa.html" target="_blank"><b>https://community.hortonworks.com/articles/59645/ambari-24-kerberos-with-freeipa.html</b></a><br />
<br />
Ambari 2.4x has FreeIPA (<a href="https://issues.apache.org/jira/browse/AMBARI-6432" target="_blank"><b>Ambari-6432</b></a>) support (experimental, but it works as promised) included. The setup and rollout is pretty simple and runs smoothly per Wizard.<br />
<br />
For <a href="http://www.cloudera.com/" target="_blank"><b>Cloudera </b></a>it takes a bit more handwork, but it works at the end also perfect and well integrated, but not at the same UI level as Ambari. These steps are necessary to get Cloudera Manager working with FreeIPA:<br />
<br />
1. create the CM principal in FreeIPA (example: <span style="font-family: Courier New, Courier, monospace;">cdh@ALO.ALT</span>)<br />2. retrieve the keytab:<div>
<span style="font-family: Courier New, Courier, monospace;">ipa-getkeytab -r -s freeipa.alo.alt -p cdh -k cdh.keytab</span><br />3. install ipa-admintools on the Cloudera Manager server </div>
<div>
<span style="font-family: Courier New, Courier, monospace;">yum install ipa-admintools -y</span><br />4. place the <a href="https://github.com/alo-alt/FreeIPA/blob/master/getkeytabs.sh" target="_blank"><b>retrieval-script</b></a> (from my GitHub) in <span style="font-family: Courier New, Courier, monospace;">/opt/cloudera/security/getkeytabs.sh</span> (or another path accessible by cloudera manager), make it executable and owned by <span style="font-family: Courier New, Courier, monospace;">cloudera-scm</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> chmod 775 /opt/cloudera/security/getkeytabs.sh && chown cloudera-scm: /opt/cloudera/security/getkeytabs.sh</span><br />5. Start the Kerberos wizard, but stop after verifying the cdh user<br />6. Set the configuration [1] for "<span style="font-family: Courier New, Courier, monospace;">Custom Kerberos Keytab Retrieval Script</span>" to "<span style="font-family: Courier New, Courier, monospace;">/opt/cloudera/security/getkeytabs.sh</span>"<br />7. resume the Kerberos wizard and follow the steps until its finished and restart the cluster.</div>
<div>
<br /></div>
<div>
<b>Important</b>:</div>
<div>
The FreeIPA client from RHEL7 / CentOS 7 uses now memory based keytabs, but Java doesn't support them (yet). To switch back to the file based ticket cache, the config file (<span style="font-family: Courier New, Courier, monospace;">/etc/krb5.conf</span>) needs to be altered by commenting <span style="font-family: Courier New, Courier, monospace;">default_ccache_name</span> out, which let the client use the default file based ticket cache;</div>
<div>
<br /></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">cat /etc/krb5.conf</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">..</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"># default_ccache_name = KEYRING:persistent:%{uid}<br />..</span></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
[1] <a href="http://www.cloudera.com/documentation/enterprise/latest/topics/sg_keytab_retrieval_script.html" target="_blank"><b>http://www.cloudera.com/documentation/enterprise/latest/topics/sg_keytab_retrieval_script.html</b></a></div>
Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com1tag:blogger.com,1999:blog-4654282947003145260.post-62009233853887826872016-10-12T08:18:00.001+02:002016-10-12T16:36:53.317+02:00Shifting paradigms in the world of BigDataIn building the next generation of applications, companies and stakeholders need to adopt new paradigms. The need for this shift is predicated on the fundamental belief that building a new application at scale requires tailored solutions to that application’s unique challenges, business model and ROI. Some things change, and I’d like to point to some of that changes.<br />
<br />
<b>Event Driven vs. CRUD</b><br />
Software development traditionally is driven by entity-relation modeling and CRUD operations on that data. The modern world isn’t about data at rest, it’s about being responsive to events in flight. This doesn’t mean that you don’t have data at rest, but that this data shouldn’t be organized in silos.<br />
The traditional CRUD model is neither expressive nor responsive, given by the amount of uncountable available data sources. Since all data is structured somehow, an RDBMS isn't able to store and work with data when the schema isn't known (schema on write). That makes the use of additional free available data more like an adventure than a valid business model, given that the schema isn't known and can change rapidly. Event driven approaches are much more dynamical, open and make the data valuable for other processes and applications. The view to the data is defined by the use of the data (schema on read). This views can be created manually (Data Scientist), automatically (Hive and Avro for example) or explorative (R, AI, NNW).<br />
<br />
<b>Centralized vs Siloed Data Stores</b><br />
BigData projects often fail by not using a centralized data store, often refereed as Data Lake or Data Hub. It’s essential to understand the idea of a Data Lake and the need for it. Siloed solutions (aka data warehouse solutions) have only data which match the schema and nothing else. Every schema is different, and often it’s impossible to use them in new analytic applications. In a Data Lake the data is stored as it is - originally, untouched, uncleaned, disaggregated. That makes the entry (or low hanging fruit) mostly easy - just start to catch all data you can get. Offload RDBMS and DWs to your Hadoop cluster and start the journey by playing with that data, even by using 3rd party tools instead to develop own tailored apps. Even when this data comes from different DWH's, mining and correlating them often brings treasures to light.<br />
<br />
<b>Scaled vs. Monolith Development</b><br />
Custom processing at scale involves tailored algorithms, be they custom Hadoop jobs, in-memory approaches for matching and augmentation or 3rd party applications. Hadoop is nothing more (or less) than a framework which allows the user to work within a distributed system, splitting workloads into smaller tasks and let those tasks run on different nodes. The interface to that system are reusable API's and Libraries. That makes the use of Hadoop so convenient - the user doesn't need to take care about the distribution of tasks nor to know exactly how the framework works. Additionally, every piece of written code can be reused by others without having large code depts.<br />
On the other hand Hadoop gives the user an interface to configure the framework to match the application needs dynamically on runtime, instead of having static configurations like traditional processing systems. <br />
<br />
Having this principles in mind by planning and architecting new applications, based on Hadoop or similar technologies doesn’t guarantee success, but it lowers the risk to get lost. Worth to note that every success has had many failures before. Not trying to create something new is the biggest mistake we can made, and will result sooner or later in a total loss.Alexander Altenhttp://www.blogger.com/profile/17445753775154988351noreply@blogger.com1