Big data? Small data? Metadata in a Data lake? I know there is some kind of joke there. Too bad we insist on writing everything in English…
Language and vocabulary can be difficult. Especially when it comes to the world of data science.
Therefore, we have described 50 of the most common terms in this glossary. Hopefully, this will help you in your journey to become data driven.
Enjoy!
Tip: Press ctrl + F or cmd + F (if you are on a mac) and search for specific words or terms.
Data Driven Glossary
3 Times Understanding
A structured process focusing on data to gain insights and knowledge about the overall needs from three different perspectives; business, know-how, and technology.
5P
5P is the process we use to identify what needs to be in place for building your data pipeline. It stands for; People, Processes, Pipelines, Platforms, and Partners.
API
API is the acronym for Application Programming Interface, which is a software intermediary that allows two applications to talk to each other.
API Economy
The API economy refers to the way application programming interfaces (APIs) can positively affect a company’s profitability, where the APIs enable businesses to either scale quickly by leveraging APIs to access third-party data and services or turn its services and data into a platform that attract partners to build upon and brings new customers onto its platform in the process.
Data Accessibility
Access to data is critical for the success of your business. Easily accessible data enables you to move quickly, focus on the product, and build a data-informed culture where data leads to better decisions and action.
Data Adoption
Data adoption is a process through which businesses find innovative ways to enhance productivity and predict risk to satisfy customers’ needs more efficiently.
Data Aggregation
The collection of data from multiple sources to bring all the data together into a common athenaeum for reporting and/or analysis.
Data Algorithms
An algorithm is a set of well-defined instructions in sequence to solve a problem.
Data Analyst
Responsible for collecting, processing, and performing statistical analysis of data. A data analyst discovers the ways how this data can be used to help the organization in making better business decisions. It is one of the big data terms that define a big data career. Data analyst works with end business users to define the types of the analytical report required in business.
Data Analytics
Data analytics is the science of analyzing raw data to make conclusions about that information. Many of the techniques and processes of data analytics have been automated into mechanical processes and algorithms that work over raw data for human consumption.
Data Cleansing
Data Cleansing/Scrubbing/Cleaning is a process of revising data to remove incorrect spellings, duplicate entries, adding missing data, and providing consistency. It is required as incorrect data can lead to bad analysis and wrong conclusions.
Data Communication
When data is communicated, whether it shows good or bad results.
Data Decisions
Decisions being made based on data.
Data Democratization
The process of democratizing data means making data accessible to as many people as possible within a company. Decisions can then be made using data that’s tangible, easily understood, and business-focused. Data democratization happens by sharing data in the right formats and channels, according to each user’s profile and level of knowledge.
Data Driven
A data driven organization is an organization that is highly committed to gathering data regarding all aspects of the business and by enabling employees at every level to use the right data at the right time, data can foster conclusive decision-making and become a part of the companies’ competitive advantage. When a company employs a data driven approach, it means it makes strategic decisions based on data analysis and interpretation.
Data Integration
Combining data from multiple separate business systems into a single unified view, often called a single view of the truth. This unified view is typically stored in a central data repository known as a data warehouse.
Data Knowledge
The organization knows what data they have access to, what they want to do with it, and they also have a process for going from question to action using their data.
Data Lab
A data lab is a designated data science system that is intended to uncover all that your data has to offer. As a space that facilitates data science and accelerates data experimentation, data labs uncover which questions businesses should ask, then help to find the answer.
Data Lake
A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question.
Data Literacy
People in the organization can read, understand, and communicate data.
Data Mentoring
Data mentoring is using an advisor to educate your organization in your first steps towards becoming data-driven. It adds value through increased market insight, holistic data analysis, and practical knowledge.
Data Mining
Data mining refers to techniques for deep data exploration. Data mining is done to extract relevant conclusions that enable more accurate business and/or strategic decisions.
Data Model
An abstract model that organizes elements of data and standardizes how they relate to one another and the properties of real-world entities.
Data Modeling
Data modeling is the process of creating a data model for an information system by using certain formal techniques. Data modeling is used to define and analyze the requirement of data for supporting business processes.
Data Pipeline
A data pipeline aggregates, organizes and moves data to a destination for storage, insights, and analysis. Modern data pipeline systems automate the ETL (extract, transform, load) process and include data ingestion, processing, filtering, transformation, and movement across any cloud architecture and adds additional layers of resiliency against failure.
Data Platform
A data platform combines all of the data from various data sets and acts as a centralized hub where it can be accessed for analysis and integrations. A data platform for companies in the food industry collects data from multiple systems (ERP, POS, open data, data warehouse, and much more), harmonizes it into usable and uniform structures, and provides managed APIs and applications to access the data.
Data Quality
Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is ”fit for its intended use in operations, decision making, and planning”.
Data Scientist
A data scientist is a person proficient in mathematics, statistics, computer science, and/or data visualization who establishes data models and algorithms for complex problems to solve them.
Data Strategy
A data strategy is a vision for how a company will collect, store, manage, share, and use data.
Data Visualization
Data visualization is the presentation of data in a graphical or pictorial format designed to communicate information or derive meaning. It allows the users/decision-makers to see analyzes visually in order to easier understand new concepts. This data helps
• to derive insight and meaning from the data
• in the communication of data and information in a more effective manner
Data Warehouse
The data warehouse is a system for storing data for analysis and reporting. It is believed to be the main component of business intelligence. Data stored in the warehouse is uploaded from operational systems like sales or marketing.
Data Scraping
Data Scraping, or web scraping, is an automated technique of gathering, i.e to copy, data from the web using a scraper. The scraper is set to extract specific data from targeted websites. Once it extracts the data, the scraper parses it and stores it in a spreadsheet or database in a readable format.
Decision Intelligence
Decision intelligence is a practical domain that includes a wide range of decision-making techniques. It brings both traditional and advanced disciplines together to design, model, align, execute, monitor, and adjust decision models and processes. The disciplines include decision management (including advanced nondeterministic techniques such as agent-based systems) and decision support, as well as techniques such as descriptive, diagnostic, and predictive analytics.
Decision Modelling
A visual representation of a process that shows how data and knowledge are merged to make a particular business decision.
Linked Data
Linked data refers to the collection of interconnected datasets that can be shared or published on the web and collaborate with machines and users. It is highly structured. It is used in building Semantic Web in which a large amount of data is available in the standard format on the web.
Location Analytics
Location analytics is the process of gaining insights from geographic components or location of business data. It is the visual effect of analyzing and interpreting the information which is portrayed by data and allows the user to connect location-related information with the dataset.
Machine Learning
Machine learning is the study of computer algorithms that improve automatically through experience. It is seen as a part of artificial intelligence. It applies statistical strategies and methods for using data to ”train” computers to detect and ”learn” rules for solving a task, without the computers being programmed with rules for that task ahead of time. Machine learning is used to exploit the opportunities hidden in big data.
Metadata
Metadata is data about data. It is administrative, descriptive, and structural data that identifies the assets.
Network Analysis
Network analysis is the application of graph/chart theory that is used to categorize, understand, and view relationships between the nodes in network terms. It is an effective way to analyze connections and check their capabilities in any field such as prediction, marketing analysis, and healthcare, etc.
Open Data
Open data is data anyone can use and share. It has an open license, is openly accessible, and is both human-readable and machine-readable.
You’re probably already using open data every day – for example:
• Geospatial information (in getting from point A to point B)
• Weather data (in deciding how to dress for the day)
Real-time Data
The data that can be created, stored, processed, analyzed, and visualized instantly i.e. in milliseconds, is known as real-time data.
Reference Data
It is the big data term that defines the data used to describe an object along with its properties. The object described by reference data may be virtual or physical.
SaaS
The big data term used for Software-as-a-Service. It allows vendors to host an application and then make this application available over the internet. The SaaS services are provided in the cloud by SaaS providers.
Semi-structured Data
The data, not represented in the traditional manner with the application of regular methods is known as semi-structured data. This data is neither totally structured nor unstructured, but contains some tags, data tables, and structural elements. A few examples of semi-structured data are XML documents, emails, tables, and graphs.
Structured Data
In the most general sense, Structured Data is information (data) that is organized (structured). Structured data is organized information.
Text Analytics
The process of the application of linguistics, machine learning, and statistical techniques on text-based sources. Text analytics is used to derive insight or meaning from the text data by the application of these techniques.
Unstructured Data
The data for which structure can’t be defined is known as unstructured data. It becomes difficult to process and manage unstructured data. The common examples of unstructured data are the text entered in email messages and data sources with texts, images, and videos.
Value
This term defines the value of the available data. The collected and stored data may be valuable for societies, customers, and organizations.
Volume
The total available amount of the data. The data may range from megabytes to brontobytes.
Weather Data
The data trends and patterns that help to track the atmosphere is known as weather data consisting of numbers and factors. Real-time data is available to be used in several different contexts, such as a logistics company that use weather data to optimize goods transportation.
Like what you read?
Hang out with us for more insights👍
Check out our other articles!
Elvenite rankas som en av IT-branschens bästa arbetsplatser 2024
Elvenite rankas som en av IT-branschens Bästa Arbetsplatser 2024 Elvenite har utsetts till en av Sveriges Bästa Arbetsplatser inom IT 2024, en prestigefylld utmärkelse från Great Place To Work®. Elvenite rankas på plats 15 av 25, vilket är en hyllning till vår...
När mjölken flödar stabilt, året om
När mjölken flödar stabilt, året omVisste du att mjölkproduktionen svänger kraftigt under året? Medan vi konsumenter köper mjölk i samma takt oavsett säsong, står mejerier som Valio inför stora utmaningar. Under sommaren flödar mjölken, men när vintern kommer minskar...
Så använder vi word embeddings och AI: från bordsplacering till dataanalys
Så använder vi word embeddings och AI: från bordsplacering till dataanalysAtt göra den perfekta bordsplaceringen är en utmaning vi alla kan relatera till, oavsett om det är en privat fest eller ett större företagsevent. Vågar jag sätta Anna bredvid Henrik eller kommer...