Yet another one-liner shell script challenge

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
Lets see if we have software engineers here familiar with the Linux/Unix environment.

PART ONE

Have an one liner shell command that can generate the following permutation output

The one liner shell command should be easy to manipulate to say permutate across different sets of permutations.
The example below is a permutation of
[A,B,C]⨯[x,y,z]⨯[1,2,3]

You are free to use any shell related tools/utilities/commands, just not another programming language to perform the job.
These tools/utilities/commands do not need to be natively available in the system, it can be optionally installed if required.

If you can come up with the one-liner shell script the use readily available tools/utilities/commands found in all/most linux installations, that would be a nice approach too.

Expected output
Code:
A x 1
A x 2
A x 3
A y 1
A y 2
A y 3
A z 1
A z 2
A z 3
B x 1
B x 2
B x 3
B y 1
B y 2
B y 3
B z 1
B z 2
B z 3
C x 1
C x 2
C x 3
C y 1
C y 2
C y 3
C z 1
C z 2
C z 3

Have fun
:)
 

project_00

Master Member
Joined
Jan 3, 2007
Messages
3,052
Reaction score
1,912
Bash:
input=[A,B,C]x[x,y,z]x[1,2,3]
eval echo $( echo $input | sed "s|\]x\[|\}\{|g" | tr "[]" "{}" ) | tr " " "\n" | sed 's/./& /g'
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
Bash:
input=[A,B,C]x[x,y,z]x[1,2,3]
eval echo $( echo $input | sed "s|\]x\[|\}\{|g" | tr "[]" "{}" ) | tr " " "\n" | sed 's/./& /g'
Good try anyway.
I think you are mistaken about the question. There is no input for part 1, it is just telling you that it is a permutation of 3 sets.
In any case, it's alright even if you want your input. It is an answer.

Found a minor caveat in your solution though, a trailing space is a trailing space and it matters in some cases.
Code:
$ eval echo $( echo $input | sed "s|\]x\[|\}\{|g" | tr "[]" "{}" ) | tr " " "\n" | sed 's/./& /g' | od -a
0000000    A  sp   x  sp   1  sp  nl   A  sp   x  sp   2  sp  nl   A  sp
0000020    x  sp   3  sp  nl   A  sp   y  sp   1  sp  nl   A  sp   y  sp
0000040    2  sp  nl   A  sp   y  sp   3  sp  nl   A  sp   z  sp   1  sp
0000060   nl   A  sp   z  sp   2  sp  nl   A  sp   z  sp   3  sp  nl   B
0000100   sp   x  sp   1  sp  nl   B  sp   x  sp   2  sp  nl   B  sp   x
0000120   sp   3  sp  nl   B  sp   y  sp   1  sp  nl   B  sp   y  sp   2
0000140   sp  nl   B  sp   y  sp   3  sp  nl   B  sp   z  sp   1  sp  nl
0000160    B  sp   z  sp   2  sp  nl   B  sp   z  sp   3  sp  nl   C  sp
0000200    x  sp   1  sp  nl   C  sp   x  sp   2  sp  nl   C  sp   x  sp
0000220    3  sp  nl   C  sp   y  sp   1  sp  nl   C  sp   y  sp   2  sp
0000240   nl   C  sp   y  sp   3  sp  nl   C  sp   z  sp   1  sp  nl   C
0000260   sp   z  sp   2  sp  nl   C  sp   z  sp   3  sp  nl
0000275
Code:
echo {A..C}{x..z}{1..3} | sed 's/ /\n/g; s/[^\n]/& /g; s/ $//g' # GNU sed

$ echo {A..C}{x..z}{1..3} | sed 's/ /\n/g' | sed 's/./& /g; s/ $//g' | od -t a # GNU sed
0000000    A  sp   x  sp   1  nl   A  sp   x  sp   2  nl   A  sp   x  sp
0000020    3  nl   A  sp   y  sp   1  nl   A  sp   y  sp   2  nl   A  sp
0000040    y  sp   3  nl   A  sp   z  sp   1  nl   A  sp   z  sp   2  nl
0000060    A  sp   z  sp   3  nl   B  sp   x  sp   1  nl   B  sp   x  sp
0000100    2  nl   B  sp   x  sp   3  nl   B  sp   y  sp   1  nl   B  sp
0000120    y  sp   2  nl   B  sp   y  sp   3  nl   B  sp   z  sp   1  nl
0000140    B  sp   z  sp   2  nl   B  sp   z  sp   3  nl   C  sp   x  sp
0000160    1  nl   C  sp   x  sp   2  nl   C  sp   x  sp   3  nl   C  sp
0000200    y  sp   1  nl   C  sp   y  sp   2  nl   C  sp   y  sp   3  nl
0000220    C  sp   z  sp   1  nl   C  sp   z  sp   2  nl   C  sp   z  sp
0000240    3  nl
0000242

:)
 
Last edited:

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
Lets get on to PART 2. PART 1 is just warm up. This is the fun part.

PART 2
Here is a directory DEMODIR with the listing of 100 AVI movie files
Code:
$ ls DEMODIR/
movie1.avi movie2.avi movie3.avi
...
movie98.avi movie99.avi movie100.avi

The job is to transcode all these movie files into their respective mkv and mp4 formats. We will use ffmpeg to do this, and the command to transcode one input avi file to both mkv and mp4 formats are simply
Code:
ffmpeg -i input.avi output.mkv
ffmpeg -i input.avi output.mp4

Since there are a lot of files and the system has multicores (assume 8 - no hyperthreading), lets concurrently transcode these movie files as fast as possible. (Hint: It will be foolish to start 200 concurrent processes to transcode at the same time since excessive context switching will diminish the overall throughput performance. Best will be only run 8 concurrent process at any point of time, when 1 has finish, start a new process to process a new movie file (if any) to keep the entire system cpu resources as pack as 100% as possible) Now provide a oneliner command that will perform the above task.

Have fun again.
:)
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
Code:
for i in {A..C}{x..z}{1..3}; do echo ${i:0:1}' '${i:1:1}' '${i:2:3}; done
Good. Works too for PART 1. PART 2 is the real game.
What you choose for PART 1 do relate to PART 2, but if you just use bare shell approach, PART 2 will be much harder.
The right tool matters. Feel free to take a shot ?

Eg: ls is a tool, so are grep, sed, jq and many more. I am not setting limit here, be imaginative.

@sooqing I wonder what is your point to post, generate a notification and delete your original post. Does it serve any real purpose ? This is a tech sharing session, challenge and also for entertainment.

:)
 
Last edited:

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
Lets get on to PART 2. PART 1 is just warm up. This is the fun part.

PART 2
Here is a directory DEMODIR with the listing of 100 AVI movie files
Code:
$ ls DEMODIR/
movie1.avi movie2.avi movie3.avi
...
movie98.avi movie99.avi movie100.avi

The job is to transcode all these movie files into their respective mkv and mp4 formats. We will use ffmpeg to do this, and the command to transcode one input avi file to both mkv and mp4 formats are simply
Code:
ffmpeg -i input.avi output.mkv
ffmpeg -i input.avi output.mp4

Since there are a lot of files and the system has multicores (assume 8 - no hyperthreading), lets concurrently transcode these movie files as fast as possible. (Hint: It will be foolish to start 200 concurrent processes to transcode at the same time since excessive context switching will diminish the overall throughput performance. Best will be only run 8 concurrent process at any point of time, when 1 has finish, start a new process to process a new movie file (if any) to keep the entire system cpu resources as pack as 100% as possible) Now provide a oneliner command that will perform the above task.

Have fun again.
:)
No takers for PART 2 ? BTW this is not an academic question. This is a real delivery question because doing batch processing, post-processing across large number of files, making use as much processing capacity as possible on a single host or even across multiple hosts is desired in today's multithreading/multiprocess/multinodes environments. Surely there must be tools readily to perform all these tasks and allow you to customise it to your needs right ?

PART1 is just a lead. In fact, the same technique for PART2 can be used in PART1 too.
:)
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
Found a more concise way with brace expansion for PART 1.
Code:
printf "%s\n" {A..B}' '{x..z}' '{1..3}

Here I will give you a hint to PART2 with one of my solution for PART1.
Code:
parallel echo ::: {A..B} ::: {x..z} ::: {1..3}

:)
 
Last edited:

project_00

Master Member
Joined
Jan 3, 2007
Messages
3,052
Reaction score
1,912
I can't output to mkv, so I changed to mpg intead
Bash:
parallel -j 8 'ffmpeg -i  "{}" "{.}."{mpg,mp4}' ::: $(ls *.avi)
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
coming from the academic side, I am surprised to learn that any increase in encoding speed, no matter multi cores process thread cannot generate the same (or higher) quality videos wrt encoding one video at a time. I had always assumed before today that encoding video means same output with same input, just an issue of slower or faster.

Not every part of an algorithm can be parallelise, the same happens even more for a task which is likely made up of multiple different genre of operations. Not every operation is going to be taxing on your processor/core. Some parts are going to be fetching data either from memory/disk/network which will be taxing on other parts outside of the processor, such as your Direct Memory Access controller, NIC, storage, etc. In fact as you observe, SOC todays offers more than one unit of the H/W decoder or other H/W accelerated components pertaining to multimedia processing as some specific H/W instructions too. When you put all these together, you will realise even for a multithreaded application, it may not be able to fully utilise the entire processor/core, not to mention multiple processors or multiple cores.

Below I have a simple shell script that spawn 8 different processes across 3 different behaviour. 1st is just plain loop, 2nd is looping and each alternate loop it will attempt to lock on the same lock across all processes, 3rd is looping and each alternate loop it will attempt to lock on different lock for each process.

1st
jMSxKl6.png


2nd
74ek9Sz.png


3rd
jV9OuaL.png


Notice how much of the entire system is occupied between 1st and the 2 others when there is no contention involved. Notice the minute differences between 2nd and 3rd (look at the ups and downs of the graph) when sharing the same lock and not sharing any locks.

In the case of 2nd and 3rd, you can squeeze in more processes since time is spent locking. Notice how is the distribution of the cpu time between the userspace and the kernelspace between having lock and no locking ?

When it comes to media transcoding. quality speed is on its own settings pertaining to the settings used, may it be codec, resolution, bitrate, the quantisation accuracy, the distance of motion compensation and so forth. The speed is affected by having H/W acceleration or total hardware encoding and how much of it can be outsource to hardware. When you have multiple transcoding happening at the same time, the mix is even more chaotic and hence between each cpu quantum slice, you can fit in even more into empty slots due to contention or simply idle times. A lot of events are at play here.

To put it simply across, having multiple encoding happening at the same time will not affect each other, but it will affect the entire system whether it is written multiprocessing in the first place or not, and how much of the processing power one such encoding operation can occupy. In the case of my system having 8 cores 16 threads, it might not be 8/16, it can be simply 4/8.

:)
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
I can't output to mkv, so I changed to mpg intead
Bash:
parallel -j 8 'ffmpeg -i  "{}" "{.}."{mpg,mp4}' ::: $(ls *.avi)
Just to share some minor tweak and also you don't need to actually perform any encoding.
I am only interested in the distribution of work
Bash:
# your command can be the following for demonstration
$ parallel -j 8 'echo ffmpeg -i  "{}" "{.}."{mpg,mp4}' ::: $(ls *.avi)

Here is mine for sharing
Bash:
# take note between :::: and :::
parallel -j 8 'echo ffmpeg -i {1} {1.}.{2}' :::: <(ls *.avi) ::: mkv mp4

# here is an alternative when you don't have GNU parallel
# xargs should be found in most (if not all) linux/*nix environments.
# just not so sure that all *nix implementation will have the parallel feature
$ ls *.avi | xargs -n 1 bash -c 'P=${0%avi}; echo $0 ${P}mkv $0 ${P}mp4' | xargs -n 2 -P 8 bash -c 'echo ffmpeg -i $0 $1'

Have fun!
:)
 
Last edited:

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
PART 3 (ADVANCE)

Now here comes the hardcore challenge. Without resorting to tools/programs/utilities that offers multithreading or multiprocessing capabilities like GNU parallel or xargs, how can one still achieve the concurrent transcoding of the 100 AVI media files into their respective MKV and MP4 formats still limited to 8 concurrent operation at each time using just shell script and readily available tools in most(or all) unices. I am still looking for a one-liner shell command, but for the sake of clarity, you can disregard it if you find it hard. Even getting a proper multiline shell script can be quite a challenge.

This question will require your understanding on how to implement SEMAPHORE in linux shell. Concurrency is non-trivial when it comes to coordination.

I hope this journey leading to here is fun and really challenging for you.
:)
 

project_00

Master Member
Joined
Jan 3, 2007
Messages
3,052
Reaction score
1,912
Interesting challenges, learnt some new things
My attempt
Bash:
for i in $(ls *.avi); do ( ffmpeg -nostdin -i $i ${i%.*}.{mpg,mp4}) & if [[ $(jobs -r -p | wc -l) -ge 8 ]]; then wait -n; fi done; wait;
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
Interesting challenges, learnt some new things
My attempt
Bash:
for i in $(ls *.avi); do ( ffmpeg -nostdin -i $i ${i%.*}.{mpg,mp4}) & if [[ $(jobs -r -p | wc -l) -ge 8 ]]; then wait -n; fi done; wait;
Well done!

Nowadays you get new generation thinks that everything need the new shiny programming languages to do it when the most common shell scripting language and linux own abundance of tools and utilities are far more capable than they think. With such a biased mindset, they don't even try adequately to realise the potential of a bare bone linux installation and start installing tons of these and that thinking that add capability to the system when all they did is increase the security attack surface.

Hope this minor challenge on shell script wake up some bright minds and start thinking about what they already have and not what more they think is needed.

:)
 
Last edited:

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
Here is my take. It's alot more convoluted than what @project_00 offers because it will work across multiple systems that is connected to the same network file system using NFS. It implemented semaphore using a common file and uses flock to ensure mutual exclusive manipulation of the file.
Bash:
LCK="./semaphore.lock"; echo 8 >$LCK; for f in $(ls ./*.avi); do for ff in "$f ${f%avi}mkv" "$f ${f%avi}mp4"; do echo $ff | ( read J K; while true; do flock 3; COUNTER=`cat $LCK`; if [ $COUNTER -gt 0 ]; then ((COUNTER-=1)); echo $COUNTER >$LCK; flock -u 3; echo ffmpeg -i $J $K; flock 3; COUNTER=`cat $LCK`; ((COUNTER+=1)); echo $COUNTER >$LCK; exit; fi; flock -u 3; sleep 1; done )3<$LCK & done done; wait
It has a quirk that it will start all the processes first and they will hang there waiting for their turn.
Have fun and till next time.
:)
 

project_00

Master Member
Joined
Jan 3, 2007
Messages
3,052
Reaction score
1,912
Well done!

Nowadays you get new generation thinks that everything need the new shiny programming languages to do it when the most common shell scripting language and linux own abundance of tools and utilities are far more capable than they think. With such a biased mindset, they don't even try adequately to realise the potential of a bare bone linux installation and start installing tons of these and that thinking that add capability to the system when all they did is increase the security attack surface.

Hope this minor challenge on shell script wake up some bright minds and start thinking about what they already have and not what more they think is needed.

:)

Agree that sometimes the power is already there, we are just unaware of it.

Command-line Tools can be 235x Faster than your Hadoop Cluster​

https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
coverpower.jpg
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300
Agree that sometimes the power is already there, we are just unaware of it.

Command-line Tools can be 235x Faster than your Hadoop Cluster​

https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
coverpower.jpg

Concur with passion.

How many unix console users these days really understood
“a | b”
where real parallelism already exist right under their nose and yet they think it is just shell syntax.

Main reason is they don’t understand the philosophy of unix system and its programming ecosystem.

If I need to use the most fundamental statement to describe the differences between Windows and Unix, it must be the fundamental pipe concept. In unix, pipe is the first thing u will ever encounter.

tty is a pipe
there is named pipe exist as file
your stdin, stdout, stderr are pipes.
Generalise the std stream and you realise file descriptors are conduit to pipes namely 0, 1, and 2, and more
you can communicate across processes using pipes
unix tree structure in its process forking ideology which windows previously(probably even now) doesn’t have is a gem.

You can’t call yourself an unix programmer if you don’t even understand these fundamentals.

Unix main philosophy is having tools that dedicate to one or few operations and having loads of them interoperate together. Pipe is the glue structure that enable communication.

Understand this and you will see it everywhere in the system. Shell script is the scripting language to make all these work together.

This understand shell script is never about the scripting language, it is about knowing what is available in unix and how to use them together to accomplish complex tasks.

:)
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,300

As an add on to the article, it may be faster within the same host due to locality advantage, but eventually when your task is humongous and requires multiple nodes to split up the data for parallel processing, this can be done using simple forking idea already found in shell, ssh over to a remote server to perform the sub task and wait for it to be done and retrieve the data. Plan the idea properly and you can introduce map/reduce using plain simple shell scripting.

If you want, throw in a devops tool like ansible and immediately the power to manage a fleet of servers becomes just a matter of how you plan your playbook.

Know your cloud computing well, one can even start up spot instances on the fly with the aws cli tool, obtain the servers list and let it managed by ansible to distribute the work, shutdown the servers once your job is done.

All these via just shell and ansible and perhaps throw in perl/python to be fully expressive with ease.

It is pathetic these days I encounter software engineers thinking working extensively within a programming language or framework is like some kind of achievement. Previously and even nowadays read in this forum some thinking understanding the OS, knowing the network infrastructure, storage infrastructure, cloud infrastructure, knowing the various databases, knowing surrounding technologies it is like some good to have 2nd class citizen knowledge when all these are what make a software engineer competent. Engineering within a small circle is a self inflicted danger, and a sin if you think anything otherwise is good to have with unfounded pride. Small little tasks and one need to start the IDE and code is just plain weird in my opinion. A software/solution architect is a software engineer that knows all these, and software/solution architect is born out of being a software engineer first. He don’t get promoted because he doesn’t know all these. Some architect may even need to know products knowledge may it be SaaS or in house deployment.

Readers are highly recommended to take these thoughts back, digest it and rediscover what you have been missing right under your nose.

:)
 
Last edited:
Important Forum Advisory Note
This forum is moderated by volunteer moderators who will react only to members' feedback on posts. Moderators are not employees or representatives of HWZ Forums. Forum members and moderators are responsible for their own posts. Please refer to our Community Guidelines and Standards and Terms and Conditions for more information.
Top