Our Technology Stack
Technologies and Concepts we work with
Here you find a non-exhaustive list of our technology stack as well as our understanding of it.
If you are interested in our consulting services, those are the technologies you can most likely expect that we will be using.
If you are interested in a job with us as potential employee, those are some of the technologies you would be working with. It makes sense to be prepared for job interview questions regarding the following topics ;)
Artificial Intelligence (AI) describes software systems with decision-making capabilities. These decision-making capabilities are based on a wide variety of techniques. Most successful once in recent history revolve around Machine Learning and Deep Learning.
Apache Flink is a stream processing framework and distributed data processing engine. Its focus are stateful functions on bounded and unbounded data streams. Apache Flink can be often found in conjunction with Apache Kafka.
Apache Kafka is a distributed event streaming platform. It stores data as a write-ahead binary log in so-called topics. Apache Kafka’s API is one of the most commonly used network protocols for event streaming storages and applications.
Apache Pinot is a realtime database. In Data Analytics we distinguish between realtime and batch analytics. The Former is used to create analysis, insights, and decisions with as much fresh data as possible.
Apache Spark is a data processing framework and distributed data processing engine focused on in-memory data processing. It offers multiples interfaces to its core data structure Resilient Distributed Datasets (RDD) from dataframes, SQL, to Graph representations.
Apache Superset is data visualisation and dashboarding tool. It offers integrations with various data sources like PostgreSQL, Apache Spark, and various Data Warehouses via JDBC.
Blob Storages are data storage systems accessed over the network and meant for usage of any type of data, much like directories on a local computer. Well known blob storages are: Amazon S3, Azure Blob Storage, Google Blob Storage, Oracle
Business Intelligence (BI) in context of software describes tools for analysing, visualising, and reporting business ideas and numbers with data. Tableau, PowerBI, and Apache Superset are such software systems.
Caches store non changing data on the fastest data storage layer outside the CPU. Often, this is the so-called working memory or Random Access Memory (RAM). Commonly deployed Caches are Redis and Memcached.
Cloud defines the booking and usage of computational resources without having to buy, operate, and sometimes even know the specification. Clouds are often defined by a pay per use model. Newer payment models have been introduced, see Billing Models. Most clouds also have an API to automate the provisioning and maintenance of computational resources.
Cloud Development Kit (CDK) is an abstraction for programmatically declaring Cloud resources to be provisioned. There are multiple frameworks from various vendors for this - most notably are the AWS CDK and GCP CDK.
Cloud Providers are companies running data-centers and renting out resources programmatically. The four largest offerings, by revenue, in this space are Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure, and Oracle Cloud Infrastructure (OCI).
Containers are a common mechanism to packages, distribute, and operate software. They package an abstraction of a given host system without the overhead of a Virtual Machine (VM). Docker is probably the most famous container definition, container packaging, and container runtime.
Container Runtimes are processes, which orchestrate software distributed as containers. Over the last decade some standard formats were agreed upon for the definition of containers by the Container Runtime Initiative.
Containerized Application describe software packages, which are packaged and distributed as self-sufficient software packages called container. A container has only the requirement to have a running container runtime on its host system. All potential third party packages are contained within the software package, hence it is called container.
Dashboards within the context of data analytics and software describe visual representations of application data. These data can either be operational for the software system itself or usage data and evaluations on top.
Data Analytics describes the whole set of technologies, methodologies, architectures, and more to gain insight into data, collected and stored with computational resources. Often AI or Machine Learning is an evolution of data analytics, which is evolving from insights to predictions.
Data Deduplication is the process of detecting and eliminating datasets, that might be duplicate entries in a given data store. E.g., a customer buying two products from the same vendor via different channels might have been captured twice. Removing these duplications is crucial for various aspects of a business.
Data Engineering is the process of observing and manipulating data, so that data captured by applications might become useful to a given technical or business purpose.
Data Integrity are processes and techniques put in place to guarantee accuracy, completeness, and validity of companies’ data. Analytics dashboard having a consistent worldview is at the essence of data work.
Data Lake describes architectural patterns on data processing, loosely bound together by a file system like network storage. It encapsulates all stages, spanning from data collection to cataloging, analysis, and prediction.
Data Locality is the concept of reducing the distance between storage and compute needed for processing data.
Data Mesh is an abstract idea on how data should be organised form a companies perspective to increase the self-serve and discovery nature of data. The concept data mesh focuses more on the people side of the problem, than on the technical solutions to implement these.
Data Reports or Data Driven Reports are applications which generate graphs and variable texts to deliver a compelling story with respect to the idea being presented and backed by numbers and analysis.
Data Residency refers to the geographical location of data and the storage system it is stored upon. Some countries and cooperations have strict regulatory requirements on where they can store data.
Data Resiliency describes the ability to access ones data at all times.
Data Science groups some activities around data, especially with a focus on creating predictions by finding a suitable model for the given domain.
Data Streaming is a set of data storages and processing frameworks focusing on the semantics of moving transient data. Computations are done during the movement of data instead of loading these into a data warehouse first.
Data Streaming Pipelines are applications and processes transforming and aggregating streams of data. The resulting sets of data commonly end up in an analytics storage or get sent down to the user immediately.
Data Warehouse describes data storage systems focused on analytics. These systems often are called online analytical processing (OLAP), focussing on storing data in a shape to enhance the speed of aggregations.
Distributed Computing is one of the major subfields in computer science concerned with the correctness of algorithms on multiple network connected machines.
Extract Transform Load (ETL) are techniques to makes sense of various data source within a company. The goal is often to load data into a data warehouse.
Feature Engineering describes the process of discovering and creating valuable insights in data from given data fields or their transformation (feature). Often used as the pre-cursor for data scientist to derive a better model.
Java is a programming language heavily used in enterprises. Its ecosystem is ever-changing and since the mid 2010 the higher release frequency has led to greater re-adoptions. Since the inception of GraalVM, Java and its runtime can also be compiled into binaries.
Kubernetes is the largest and widely distributed process and container orchestration systems in the world. It provides a great ecosystem of software built on top or adopted to make operations for developers easier.
Machine Learning (ML) is a technique in the field of Artificial Intelligence. ML techniques consolidate around the idea of a computer becoming better at a given task described by optimising for a given function. This is often aided by data.
Managed Services describe the automated management of software infrastructure, databases, web application servers, authentication services etc. by a vendor.
Platform as a Service (PaaS) describe offerings, that allow a developer to focus on code, while the PaaS provider offers management of databases, web application servers, load balancers etc.
Python is a general purpose scripting language. Whilst popular in the web application development space during the 00s, it is now de-facto the best ecosystem for AI application development and model development with its rich ecosystem around numpy, pandas, polars, pytorch etc.
Python Pandas is a dataframe library for data processing effectively and ergonomically in Python.
Python Polars is a pandas inspired dataframe library written in Rust, with the addition, that Polars has a large focus on the ability to have larger than memory datasets.
Relational Database Management Systems (RDMS) are databases with additional services. These often include a network protocol, an authentication system and policies to control a given data granularity level.
Realtime Databases are databases focused on aggregating streaming data sets. These are often low latency aggregating databases with a focus on aggregating the most recent data sets in a given time.
Rust is a systems programming language, that promises zero cost abstractions. Its standout feature is the concept of Ownership to ensure memory safety. The concept of Ownership allows for compile time validation, that requested memory pages are never to be written to by two computations at the same time.
Serverless describes services, that run code on the behalf of the developer, without the developer having to have any concerns around infrastructure.
Service Level Agreements are contracts or contract extensions between a service provider and its customer on what guarantees around availability and quality a customer can expect from a given service.
Snapshot describes the copy of a whole database at a given point in time.
Software Defined Networks are networks on top of physical hardware networks, allowing for more isolation, whilst staying true to the system of how networks actually work. Therefore, the interfaces for configuring stay almost the same, but the actual network is never being physically changed.
Symbolic AI describes techniques in AI, that leverage symbolic logic and search. There is a current resurgence of symbolic AI in combination with the dominant approaches of machine learning.