Direkt zum Hauptbereich

I want to understand RegEx! (Part1)




Have you ever been a witness of such dialogues? Have you ever understood what the people talking about if they mention RegEx? Do you want to speak this language? Then, read this series of the blogs about RegEx.

RegEx is a huge part and we can talk about it a long time. There are a lot of good blogs and videos to this topic and it takes a lot of time to understand all of them.

In this article I want to give you a short introduction into "RegEx-World". I want you to understand what this mean and how you can use it. And If you decide to became a RegEx professional then you need to read more blogs or books also to watch more videos :).

But let's start step by step

Idea of RegEx:

Imagine you have data like this:
website.com/domain/orderpage.html?productcat=shoes&brand=12345 Moin St.&shoenumber=12345&color=black.www.bluewebsite.com/domain/orderpage.html?=Michle/productcat=shoes&brand=3456 Silvester St.&shoenumber=5678&color=www.bluewebsite1.com/domain/orderpage.html?=Jennifer/productcat=shoes&brand=3456 Silvester St.nike&shoenumber=123423&color=www.bluewebsite2.com/domain/orderpage.html?productcat=shoes&brand=nike&shoenumber=12345&color=red

And you want to extract all names, address, websites and colors. In order to do this, you need RegEx.

In other words you have to say your program:

"Look through the text and extract all names, addresses, websites and colors!"

Imagine the program, you are using, is a stranger and it doesn't speak your language. It understands only RegEx as a language.

At first I would like to extract all address (12345 Moin St.; 3456 Silvester St.) . Thus I'll say to the program:  "Look at all postal codes and Street Names"
In order to say something on RegEx language you have to use RegEx signs. 

Postal Codes: 12345
\d{1,5} (d for digit; 1,5 - The postal code is a number with min 1 and max 5 digits )
\s (for space)

Street Name:Moin St.
\w+ (for a word different length)
\s
\w+
\.

And this is my sentence in the RegEx language: 

\d{1,5}\s\w+\s\w+\.

Note!
Every single string is separated with \

Here are some basic RegEx signs for you:

/d   Represent any number
/D   Represent anything but a number
/s    Represent space
/S   Represent anything but space
/w   Represent any character
/W  Represent anything but character
/b   is a zero with assertion. That means it does not match a character, it matches a position with one thing on the left side and another thing on the right side. (A Cat)
\e   Escape
\f   Form Feed
\n   Newline
\r   Carriage Return
\t   Tab

?        you are looking for 0 or 1 repetitions
*        0 or more repetitions
{n}     you know exactly number of repetitions you are looking for (e.g. number of postal code)
[a-z]   you are looking for every single lowercase letter
[A-Z] you are looking for every single uppercase letter
[0-9]  you are looking for every single number


Examples:

You are looking for "Michle".
The RegEx for this: 'Michle\s'

You are looking for a dollar amount
$100.00
The RegEx for this: \$\d*\.\d{2}

I defenetly advise you to watch this tutorial.It helped me to understand RegEx:



This links could be helpful:
I hope you understand now how to use and speak RegEx....










Kommentare

Beliebte Posts aus diesem Blog

Tableau Table Calculation Function: WINDOW - Functions

The functions which begin with „WINDOW_...“ are also common used in Tableau. Remember! The “WINDOW_” function stays for the offset in data set, so-called WINDOW. It can look like this: 1) You can see the “WINDOW” clearly because of separation line between rows: 2) You limit the “Window” by giving the information about the first and the last row number. In this case, you give Tableau the information about the data offset. Let's have a look at the example with WINDOW_SUM I created a sample with data from Superstore. I would like to have a total sum of Sales in every row. In order to do this I created a calculation field: WINDOW_SUM(SUM([Sales]), FIRST(), LAST()) With this formula I said to Tableau: “Hey Tableau, calculate the total sum of sales from the first till the last row in the data set” And this is the result: Tableau wrote the result (total sum of Sales) in every row. As another option you can cumulate the result in each row and...

Tableau Number Function: ABS

I have been working with Tableau since 2014, but I have still a feeling that my knowledge about this software not good enough. When I rebuild dashboards from tableau.public I am fascinating how some people can create amazing formulas and thus it makes the calculation of data sets easier. Sometimes I got stuck by creating of graphs and I do some calculations with data in Excel before I visualize them. Otherwise, you can find the definition of every function by creating of calculation field. But, honestly, are this definitions always clearly enough? For me not and I am done by doing long calculations with Excel. 😠 I like my work and I want to have more fun and not do such long calculation with Excel! I took up a challenge and decided to go through every function in tableau in order to understand how they work.  I would like to begin with ABS function.  ABS function is very easy one, as it used to get an absolute value of a number from a negative one. As an examp...

Level of Detail Expressions (LoD) in Tableau: FIXED

The last keyword of LoD I am talking to is FIXED. Remember! By the function INCLUDE we included some numbers into the calculation even though we don't visualize this number. By the function EXCLUDE we exclude some numbers from the calculation even though we visualize them. With the function FIXED we “freeze” some numbers in our calculation, i.e. we can run calculation of all art, but the “fixed” dimension remains unchanged. This is the structure of FIXED function: With this function we can e.g calculate a frequency of customer's orders, calculate unique customers from month to month as the cumulative value etc.. Let us have a look to an example, where we analyze sub-categories: I created a simple cross table with the data from Superstore: I would like to see this numbers in a hint text, if I analyze sub-categories. I created a stucked bar chart, which shows sales in each region. I also highlighted sub-categories with different color, as I would ...