How to install dependencies from a requirements.txt file with conda

Just a little reminder: pip has this very useful option to install a bunch of packages from a single text file mostly called requirements.txt. Anaconda’s command line tool conda doesn’t support this option directly. It does support reading the package names from a file using the –yes and –file option

but that does not automatically install all the dependencies. To do this, we need to iterate over the file and install each package in “single package mode”:

Thanks to Luis Capelo for this snippet which I use to install dependencies in a dockerized instance of Anaconda / Jupyter (more on that in a later post).

1 comment

Note to self: bash path hashing /o\

Did you ever stumble across something like this:

Yes, this is strange, isn’t it? The PATH variable is set in the correct order (this is why ‘which’ finds the local Python). Googling about this behavior at first didn’t bring up any solution. But then I came across this now closed question on Stackoverflow.

So once you know what you are looking for Google reveals lots and lots of people having trouble with path hashing. Now, my solution was quite simple:

PS: To clear the complete bash path cache just use “hash -r”.

0 comment

Note to self: Allow to run applications from unverified sources on MacOS X Sierra 10.12

Every now and then I stumble across some more or less cool application I install on my Mac. Some of them were built by authors not verified by Apple. Whenever this happens you see a box popping up telling you that you can’t run that app. You can go to the security / Gatekeeper pref pane and allow for this app for this time to run but the option just to allow all unverified apps to start is gone since 10.11 I suppose.

Since Apple very seldom removes a functionality from a new release there is a way to switch to allowing those apps from the command line. I found a solution on the german Mac site MacGadget:

This works just like to old option setting. HTH.

0 comment

Note to self: How to count things in Groovy collections

notetoselfThis time I would like to add a short note on how to find things in Groovy collections. Remember: collections is the general term for lists and maps, in other languages sometimes referred to as arrays or dictionaries.

Groovy has a standard method to count all elements of a collection. It is called size():

If you need to know the number of elements in a collection that fit a certain filter, it’s time to switch to count(). Count takes a closure and counts all elements, for which the closure yields true. This can bes as simple as counting all elements larger than 3:

Now what, if the elements of the list are  objects and I want to filter by a specific feature of the objects. No problem:

With maps it’s a bit more tricky. The it object inside the closure is of type LinkedHashMap$Entry, so we have to deal with its key and value attributes:

Hope that helps.  See you next time!

0 comment

Note to self: Crawling the web and stripping HTML and entities on the shell

notetoselfEver tried to download a list of strings from a web page? There are numerous solutions to such problems. Here is my sort of a toolbox solution which only uses shell commands. This means it’s scriptable for many sites/urls.

In my case the HTML contained the desired list of strings, each on it’s own line, each surrounded by <b> Tags. So we can filter out all lines not starting with a <b> tag:

If you try to crawl several sites, the for loop would look like this:

This will leave us with (a) file(s) still containing HTML entities. To strip them from the file you can use a text based HTML browser like w3m:

With our for loop over sites we have several text files which all need to be filtered. Use a “triangle swap” for that:

Happy crawling!

0 comment

Numbering lines with Unix

notetoselfHave you ever had a csv file and wanted to import it into a database? And you would like to add a leading ID column numbered from 0, separated by, let’s say a colon? Here’s a hint: use the Unix pr (for print) utility:

My test.csv contains a list of all world manufacturer ids (WMI) for car VINs (vehicle identification number). the first few rows look like:

Please note that column headers are added later on. Now the output looks like this:

Now for the curious: what does the command line do?
First for the pr part:

  • -t means: omit headers (remember: normally pr is used to print paginated content …)
  • -n, means: number lines. Use colon as a separator
  • -N0 means: start with 0

So much for that part. The pr utility normally numbers lines within a given column width (standard is 5 chars). This results in leading whitespace. We don’t want that, so the sed command removes spaces and tabs at the beginning of the line.
Enough Unix magic for now. Happy hacking!

Update: Detlef Kreuz just mentioned on Twitter, that this task could also be accomplished with awk:

Here awk executes the commands inside the curly braces for every line of input. Each line will first print the line number minus 1, followed by a colon and the complete line. $0 is an internal awk variable containing the complete currect line, while $1, $2 … contain the split up fields (where to split is determined by FS, the field separator, which defaults to a space). Thanks Detlef!

0 comment

Note to self: How to use screen

notetoselfThis posting will start a series of rather short articles, where I present things that I use from time to time but tend to forget how to do it :)
The first serving will deal with the undeniable useful Unix command screen. Screen can open a virtual screen, there you can start running long term processes and you can detach at any time and reattach later, while the process continues to run. You can view screen as a nohup on steroids. Start it with a blank shell and create a session with the symbolic name testo:

You are greeted with … well, a fresh and clean shell. Here you can start doing things that will run a long time. To detach from that screen, use the key sequence ctrl-a d. Nearly all key sequences for screen start with crtl-a. And the “d” stands for “detach”. To see whats going on behind your back, use the screen list command:

Here 1387.testo is the key to the session, consisting of the process id and the symbolic name:

To reattach to the screen, you might have guessed it, you can use a screen reattach:

You can detach and reattach to the screen as often as you like. When done with your long running processes, just log out of the screen using ctrl-d. You will be informed that the screen has been shut down:

0 comment

I wish I could look at that in my browser …

Sometimes you would like to see some information, which is readily available from a unix command in your browser. If it’s in a private network and / or the information doesn’t do any harm when read by unauthorized people or it’s for a rather short period of time, then ashttp does the trick.
ashttp is a python script by Julien Palard (@sizeof) using a headlesss vt100 terminal emulator to run a script each time the http server gets a request, grab the output and deliver it via http to the requesting browser.
For example the output of top:

This will start up an http server on port 8081 (you can also use –port) and every request to that server will deliver the output of a fresh top command:
At the moment there seems to be a small problem with forwarding the command line parameters of the unix command, so you can circumvent that by putting your more complex statement into a shebang’ed shell script and calling this one from ashttp:

Have fun!

Update: @n770 correctly mentioned, that having swig installed is a prerequisite to building the python hl_vt100 module.